ARM_03_FPtree

Frequent-Pattern Tree

Bottleneck of Frequent-pattern Mining

 Multiple database scans are costly

 Mining long patterns needs many passes of scanning and generates lots of candidates



To find frequent itemset i

1 i

2

…i

100

 # of scans: 100

 # of Candidates: (

100

1 ) + (

100

2 ) + … + (

1

1

0

0

0

0 ) = 2 100 -1 =

1.27*10 30 !

 Bottleneck: candidate-generation-and-test

 Can we avoid candidate generation?

2

Mining Freq Patterns w/o Candidate Generation

 Grow long patterns from short ones using local frequent items









“abc” is a frequent pattern

Get all transactions having “abc”: DB|abc (projected database on abc)

“d” is a local frequent item in DB|abc  abcd is a frequent pattern

Get all transactions having “abcd” (projected database on

“abcd”) and find longer itemsets

3

Mining Freq Patterns w/o Candidate Generation

 Compress a large database into a compact, Frequent-

Pattern tree (FP-tree) structure



Highly condensed, but complete for frequent pattern mining



Avoid costly database scans

 Develop an efficient, FP-tree-based frequent pattern mining method



A divide-and-conquer methodology: decompose mining tasks into smaller ones



Avoid candidate generation: examine sub-database

(conditional pattern base) only!

4

Construct FP-tree from a Transaction DB

TIDItems bought (ordered) frequent items

100

200

300

{f, a, c, d, g, i, m, p}

{a, b, c, f, l, m, o}

{b, f, h, j, o}

{f, c, a, m, p}

{f, c, a, b, m}

{f, b} min_sup=

50%

400

500

{b, c, k, s, p}

{a, f, c, e, l, p, m, n}

{c, b, p}

{f, c, a, m, p}

Steps:

1.

Scan DB once, find frequent

1-itemset (single item pattern)

2.

Order frequent items in frequency descending order: f, c, a, b, m, p (L-order)

3.

Process DB based on Lorder c d f e a b g h

3 i

3 j

4 k

1 l

1 m

4 n

1 o

1 p

1

2

3

1

1

1

2

3

5



100

200

300

{f, a, c, d, g, i, m, p}

{a, b, c, f, l, m, o}

{b, f, h, j, o}

{f, c, a, m, p}

{f, c, a, b, m}

{f, b}

400

500

{b, c, k, s, p}

{a, f, c, e, l, p, m, n}

{c, b, p}

{f, c, a, m, p}

{}

Header Table c a b

Item frequency head f 0 nil m p

0

0

0

0

0 nil nil nil nil nil

Initial FP-tree

6



100

200

300

{f, a, c, d, g, i, m, p}

{a, b, c, f, l, m, o}

{b, f, h, j, o}

{f, c, a, m, p}

{f, c, a, b, m}

{f, b}

400

500

{b, c, k, s, p}

{a, f, c, e, l, p, m, n}

{c, b, p}

{f, c, a, m, p}

{}

Header Table f:1 c a b

Item frequency head f 1 m p

1

1

0

1

1 nil c:1 a:1 m:1

Insert {f, c, a, m, p} p:1

7



100

200

300

{f, a, c, d, g, i, m, p}

{a, b, c, f, l, m, o}

{b, f, h, j, o}

{f, c, a, m, p}

{f, c, a, b, m}

{f, b}

400

500

{b, c, k, s, p}

{a, f, c, e, l, p, m, n}

{c, b, p}

{f, c, a, m, p}

{}



2

2

1

2

1 c2 a:2 m:1 b:1

Insert {f, c, a, b, m} p:1 m:1

8



100

200

300

{f, a, c, d, g, i, m, p} {f, c, a, m, p}

{a, b, c, f, l, m, o}{f, c, a, b, m}

{b, f, h, j, o} {f, b}

400

500

{b, c, k, s, p}

{a, f, c, e, l, p, m, n}

{c, b, p}

{f, c, a, m, p}

{}



2

2

2

2

1 c:2 a:2 m:1 b:1 b:1

Insert {f, b} p:1 m:1

9



100

200

300

{f, a, c, d, g, i, m, p} {f, c, a, m, p}

{a, b, c, f, l, m, o}{f, c, a, b, m}

{b, f, h, j, o} {f, b}

400

500

{b, c, k, s, p}

{a, f, c, e, l, p, m, n}

{c, b, p}

{f, c, a, m, p}

{}

Header Table f:3 c:1 c a b


3

2

3

2

2 c:2 a:2 m:1 b:1 b:1 b:1 p:1

Insert {c, b, p} p:1 m:1

10



100

200

300

{f, a, c, d, g, i, m, p} {f, c, a, m, p}

{a, b, c, f, l, m, o}{f, c, a, b, m}

{b, f, h, j, o} {f, b}

400

500

{b, c, k, s, p}

{a, f, c, e, l, p, m, n}

{c, b, p}

{f, c, a, m, p}

{}

Header Table f:4 c:1 c a b


4

3

3

3

3 c:3 a:3 m:2 b:1 b:1 b:1 p:1

Insert {f, c, a, m, p} p:2 m:1

11

Benefits of FP-tree Structure

 Completeness:





Preserve complete DB information for frequent pattern mining (given prior min support)

Each transaction mapped to one FP-tree path; counts stored at each node

 Compactness



One FP-tree path may correspond to multiple transactions; tree is never larger than original database

(if not count node-links and counts)





Reduce irrelevant information—infrequent items are gone

Frequency-descending ordering: more frequent items are closer to tree top and more likely to be shared

12

How Effective Is FP-tree?

Alphabetical FP-tree

Tran. DB

100000

10000

1000

100

10

1

0%

Dataset: Connect-4

(a dense dataset)

20%

Ordered FP-tree

Freq. Tran. DB

40% 60%

Support threshold

80% 100%

13

Mining Frequent Patterns Using FP-tree

 General idea (divide-and-conquer)



Recursively grow frequent pattern path using FP-tree

 Frequent patterns can be partitioned into subsets according to L-order



L-order=f-c-a-b-m-p



Patterns containing p









Patterns having m but no p

Patterns having b but no m or p

…

Patterns having c but no a nor b, m, p



Pattern f

14

Mining Frequent Patterns Using FP-tree

 Step 1 : Construct conditional pattern base for each item in header table

 Step 2: Construct conditional FP-tree from each conditional pattern-base

 Step 3: Recursively mine conditional FP-trees and grow frequent patterns obtained so far



If conditional FP-tree contains a single path, simply enumerate all patterns

15

Step 1: Construct Conditional Pattern Base

 Starting at header table of FP-tree

 Traverse FP-tree by following link of each frequent item

 Accumulate all transformed prefix paths of item to form a conditional pattern base

Header Table {} b m p

Item frequency head c f a

4

4

3

3

3

3 c:3 a:3 m:2 f:4 b:1 b:1 c:1 b:1 p:1 p:2 m:1 c a b

Conditional pattern bases item cond. pattern base m p f:3 fc:3 fca:1, f:1, c:1 fca:2, fcab:1 fcam:2, cb:1

16

Step 2: Construct Conditional FP-tree

 For each pattern-base



Accumulate count for each item in base



Construct FP-tree for frequent items of pattern base b m p c a

Conditional pattern bases item cond. pattern base f:3 fc:3 fca:1, f:1, c:1 fca:2, fcab:1 fcam:2, cb:1 fcam fcam cb f 2 c 3 a 2 m 2 b 1 min_sup= 50%

# transaction =5 p conditional FP-tree

Item frequency head c 3

{} c:3

17

Mining Frequent Patterns by Creating Conditional Pattern-

Bases b a c f

Item p m

Conditional pattern-base

{(fcam:2), (cb:1)}

{(fca:2), (fcab:1)}

{(fca:1), (f:1), (c:1)}

{(fc:3)}

{(f:3)}

Empty

Conditional FP-tree

{(c:3)}|p

{(f:3, c:3, a:3)}|m

Empty

{(f:3, c:3)}|a

{(f:3)}|c

Empty

18

Step 3: Recursively mine conditional FP-tree

 Collect all patterns that end at p suffix: p(3)

FP: p(3) CPB: fcam:2, cb:1

FP-tree: c(3)

Suffix: cp(3)

FP: cp(3) CPB: nil

19

Step 3: Recursively mine conditional FP-tree

•

Collect all patterns that end at m

FP-tree: suffix: m(3) f(3) c(3)

FP: m(3) CPB: fca:2, fcab:1 a(3) suffix: am(3) suffix: cm(3) suffix: fm(3)

Continue next page

FP: cm(3)

FP-tree:

CPB: f:3 f(3)

FP: fm(3) CPB: nil suffix: fcm(3)

FP: fcm(3) CPB: nil

20

Collect all patterns that end at m (cont’d)

FP-tree: suffix: am(3)

FP: am(3) CPB: fc:3 f(3) c(3) suffix: cam(3) FP-tree: suffix: fam(3)

FP: cam(3) f(3)

CPB: f:3 FP: fam(3) suffix: fcam(3)

FP: fcam(3) CPB: nil

CPB: nil

21

FP-growth vs. Apriori: Scalability With the Support Threshold

100

90

80

70

60

50

40

30

20

10

0

0 0.5

Data set T25I20D10K

1 1.5

Support threshold(%)

2

D1 FP-grow th runtime

D1 Apriori runtime

2.5

3

22

Why Is Frequent Pattern Growth Fast?

 Performance study shows



FP-growth is an order of magnitude faster than Apriori

 Reasoning









No candidate generation, no candidate test

Use compact data structure

Eliminate repeated database scan

Basic operations are counting and FP-tree building

23

Weaknesses of FP-growth

 Support dependent; cannot accommodate dynamic support threshold

 Cannot accommodate incremental DB update

 Mining requires recursive operations

24

Maximal patterns and Border

 Maximal patterns: An frequent itemset X is maximal if none of its superset is frequent

Maximal

Patterns

= {AD, ACE, BCDE}

•

20 patterns, but only 3 maximal patterns !

A B null

C D E

•

Can use 3 maximal patterns to represent all 20 patterns

AB AC AD AE BC BD BE CD CE DE

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE

ABCD

E

ACDE BCDE

Border

25

Closed Itemsets



An itemset X is closed if there exists no item y

(y



X) such that every transaction containing X also contains y

Tid

1

4

5

2

3

6

Itemset

AC T W

CDW

AC T W

AC D W

AC DT W

CDT

Example:

•

AC is not closed since every transaction containing AC also contains W

•

CDW is closed since transaction Tid=2 contains no other item

26

Frequent Closed Patterns











For frequent itemset X, if there exists no item y s.t. every transaction containing X also contains y, then X is a frequent closed pattern

“acdf” is a frequent closed pattern

Concise rep. of freq pats

Reduce # of patterns and rules

N. Pasquier et al. In ICDT’99

Min_sup=2

TID Items

10 a, c, d, e, f

20 a, b, e

30 c, e, f

40 a, c, d, f

50 c, e, f

27

ARM_03_FPtree

Frequent-Pattern Tree

An itemset X is closed if there exists no item y

(y

X) such that every transaction containing X also contains y

Related documents

Products

Support

ARM_03_FPtree

Frequent-Pattern Tree

An itemset X is closed if there exists no item y

(y

X) such that every transaction containing X also contains y

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib