AutoPlait at work

advertisement
AutoPlait: Automatic Mining of
Co-evolving Time Sequences
Yasuko Matsubara (Kumamoto University)
Yasushi Sakurai (Kumamoto University)
Christos Faloutsos (CMU)
SIGMOD 2014
Y. Matsubara et al.
1
Motivation
Given: co-evolving time-series
– e.g., MoCap (leg/arm sensors)
“Chicken dance”
Time
SIGMOD 2014
Y. Matsubara et al.
2
Motivation
Given: co-evolving time-series
– e.g., MoCap (leg/arm sensors)
Q. Any distinct
“Chicken dance”
patterns?
Q. If yes, how many? Time
SIGMOD 2014
Y. Matsubara et al.
Q. What kind?
3
Motivation
Challenges: co-evolving sequences
– Unknown # of patterns (e.g., beaks)
– Different durations
…
Time
SIGMOD 2014
Y. Matsubara et al.
4
Motivation
Challenges: co-evolving sequences
– Unknown # of patterns (e.g., beaks)
Q. Can we summarize it automatically ?
– Different durations
…
Time
SIGMOD 2014
Y. Matsubara et al.
5
Motivation
Goal: find patterns that agree with human intuition
Time
SIGMOD 2014
Y. Matsubara et al.
6
Motivation
Goal: find patterns that agree with human intuition
AutoPlait: “fully-automatic” mining algorithm
SIGMOD 2014
Y. Matsubara et al.
7
Importance of “fully-automatic”
No magic numbers! ... because,
Manual
- sensitive to the parameter tuning
- long tuning steps (hours, days, …)
Automatic (no magic numbers)
- no expert tuning required
Big data mining:
-> we cannot afford human
intervention!!
SIGMOD 2014
Y. Matsubara et al.
8
Outline
-
Motivation
Problem definition
Compression & summarization
Algorithms
Experiments
AutoPlait at work
Conclusions
SIGMOD 2014
Y. Matsubara et al.
9
Problem definition
Key concepts
-
Bundle: X
Segment: S
Regime: Q
Segment-membership: F
SIGMOD 2014
Y. Matsubara et al.
given
hidden
hidden
hidden
10
Problem definition
• Bundle : set of d co-evolving sequences
given
X = {x1,..., xn }
d´n
Bundle X
(d=4)
SIGMOD 2014
Y. Matsubara et al.
11
Problem definition
• Segment: convert X -> m segments, S
hidden
Segment
(m=8)
SIGMOD 2014
S = {s1,..., sm }
s1
s2 s3
s4
Y. Matsubara et al.
s5 s6 s7
s8
12
Problem definition
• Regime: segment groups: Q = {q1, q2 ,...qr , Dr´r }
hidden
Regimes
(r=4)
beaks
wings
SIGMOD 2014
q r : model of regime r
q1
q2
q3
q4
Y. Matsubara et al.
13
Problem definition
• Segment-membership: assignment
F = { f1,..., fm }
hidden
F={
2,
4,
1,
3,
2, 4, 1, 3
}
Segmentmembership
(m=8)
SIGMOD 2014
Y. Matsubara et al.
14
Problem definition
• Given: bundle X
X = {x1,..., xn }
SIGMOD 2014
Y. Matsubara et al.
15
Problem definition
• Given: bundle X
X = {x1,..., xn }
• Find: compact description C of X
C = {m, r, S, Q, F}
SIGMOD 2014
Y. Matsubara et al.
16
Problem definition
• Given: bundle X
X = {x1,..., xn }
• Find: compact description C of X
C = {m, r, S, Q, F}
m segments
r regimes
SIGMOD 2014
Segment-membership
Y. Matsubara et al.
17
Outline
-
Motivation
Problem definition
Compression & summarization
Algorithms
Experiments
AutoPlait at work
Conclusions
SIGMOD 2014
Y. Matsubara et al.
18
Main ideas
Goal: compact description of X
C = {m, r, S, Q, F}
without user intervention!!
Challenges:
Q1. How to generate ‘informative’ regimes ?
Q2. How to decide # of regimes/segments ?
SIGMOD 2014
Y. Matsubara et al.
19
Main ideas
Goal: compact description of X
C = {m, r, S, Q, F}
without user intervention!!
Challenges:
Q1. How to generate ‘informative’ regimes ?
Idea (1): Multi-level chain model
Q2. How to decide # of regimes/segments ?
Idea (2): Model description cost
SIGMOD 2014
Y. Matsubara et al.
20
Idea (1): MLCM: multi-level chain model
Q1. How to generate ‘informative’ regimes ?
Model
beaks
claps
wings
Sequences
SIGMOD 2014
Regimes
Y. Matsubara et al.
21
Idea (1): MLCM: multi-level chain model
Q1. How to generate ‘informative’ regimes ?
Model
beaks
claps
wings
Sequences
Regimes
Idea (1): Multi-level chain model
–HMM-based probabilistic model
–with “across-regime” transitions
SIGMOD 2014
Y. Matsubara et al.
22
Idea (1): MLCM: multi-level chain model
Q = {q1, q2,...qr , Dr´r }
(qi = {p , A, B})
r regimes across-regime
(HMMs) transition prob.
SIGMOD 2014
Y. Matsubara et al.
Details
Single HMM
parameters
23
Idea (1): MLCM: multi-level chain model
(qi = {p , A, B})
Q = {q1, q2,...qr , Dr´r }
r regimes across-regime
(HMMs) transition prob.
Q
d11
a1;11
a1;31
q1
a1;12
state 1
a1;33
state 2
state 3
a1;23
a1:22
Regime1 “beaks”
SIGMOD 2014
d12
a2;11
d21
Single HMM
parameters
d22
Regime
switch
state 1
a2;12
a2;21
q2
a2;22
state 2
Regime2 “wings”
Y. Matsubara et al.
Details
Regimes
r=2
Regime 1
(k=3)
Regime 2
(k=2)
24
Idea (2): model description cost
Q2. How to decide # of regimes/segments ?
Idea (2): Model description cost
• Minimize coding cost
• find “optimal” # of segments/regimes
SIGMOD 2014
Y. Matsubara et al.
25
Idea (2): model description cost
Idea: Minimize encoding cost!
CostM
CostC
CostT
min ( CostM(M) + Costc(X|M) )
Coding cost
Model cost
1 2 3 4 5 6 7 8 9 10
(# of r, m)
Good
compression
SIGMOD 2014
Y. Matsubara et al.
Good
description
26
Idea (2): model description cost
Total cost of bundle X, given C
Details
C = {m, r, S,Q, F}
SIGMOD 2014
Y. Matsubara et al.
27
Idea (2): model description cost
Total cost of bundle X, given C
duration/dimensi
ons
segment
lengths
SIGMOD 2014
Details
C = {m, r, S,Q, F}
# of
segmentsegments/regim membership F
es
Model
description cost
of Θ et al.
Y. Matsubara
Coding cost
of X given Θ
28
Outline
-
Motivation
Problem definition
Compression & summarization
Algorithms
Experiments
AutoPlait at work
Conclusions
SIGMOD 2014
Y. Matsubara et al.
29
AutoPlait
Overview
Iteration 0
r=1, m=1
X
Start!
r=1
0 1 2 3 4 5 6 7
Iteration
Iteration
SIGMOD 2014
Y. Matsubara et al.
30
AutoPlait
Overview
f1 = 2
f2 = 1
f3 = 2
f4 = 1
X
Iteration 1
r=2, m=4 q1
q2
r=1
Split
r=2
0 1 2 3 4 5 6 7
Iteration
SIGMOD 2014
Y. Matsubara et al.
31
AutoPlait
Overview
f1 = 2
f2 = 1
f3 = 2
f4 = 1
X
Iteration 1
r=2, m=4 q1
q2
f1 = 2 f2 = 3
f3 = 1 f4 = 2 f5 = 3
f6 = 1
X
r=1
Split
Iteration 2
q
r=3, m=6 q13
q2
r=2
r=3
0 1 2 3 4 5 6 7
Iteration
SIGMOD 2014
Y. Matsubara et al.
32
AutoPlait
f1 = 2
Overview
f2 = 1
f3 = 2
f4 = 1
X
Iteration 1
r=2, m=4 q1
q2
f1 = 2 f2 = 3
f3 = 1 f4 = 2 f5 = 3
f6 = 1
X
Iteration 2
q
r=3, m=6 q13
r=1
Split
r=2
r=4
r=3
0 1 2 3 4 5 6 7
Iteration
Iteration
SIGMOD 2014
q2
f1 = 2 f2 = 4 f3 = 3 f4 =1 f5 = 2 f6 = 4 f7 = 3 f8 =1
X
Iteration 4
q1
r=4, m=8 q
3
q4
q2
Y. Matsubara et al.
33
AutoPlait
Algorithms
1. CutPointSearch
Inner-most loop
Find good cut-points/segments
2. RegimeSplit
Inner loop
Estimate good regime parameters Θ
3. AutoPlait
Outer loop
Search for the best number of regimes (r=2,3,4…)
SIGMOD 2014
Y. Matsubara et al.
34
1.CutPointSearch
Inner-most loop
Given:
- bundle X
- parameters of two regimes Q = {q1, q2 , D}
Find: cut-points of segment sets S1, S2,
{S1, S2 } = argmax P(X | S1, S2, Q)
S1,S2
X
X
{q1,q2 , D}
q1
q2
SIGMOD 2014
S1 = {s2, s4 }
S2 = {s1, s3 }
Y. Matsubara et al.
35
1.CutPointSearch
Inner-most loop
Details
q1
DP algorithm
to compute
likelihood:
P(X | Q)
L1;1 (3) = f
state 2
...
state 3
...
switch??
L1;3 (2) = f
switch?? d12
q2
state 1
d12
L2;1 (3) = {3} L (4) = {3} L2;1 (5) = {4}
2;1
...
...
state 2
X
SIGMOD 2014
...
state 1
L1;1 (1) = f
L2;2 (4) = {4} L2;2 (5) = {3} L2;2 (6) = {4}
t =1
t=2
t =3
Y. Matsubara et al.
t=4
t=5
t=6
...
36
1.CutPointSearch
Inner-most loop
Details
Theoretical analysis
Scalability
2
- It takes O(ndk ) time (only single scan)
- n: length of X
- d: dimension of X
- k: # of hidden states in regime
Accuracy
It guarantees the optimal cut points
- (Details in paper)
SIGMOD 2014
Y. Matsubara et al.
37
2.RegimeSplit
Given:
- bundle
X
Inner loop
X
Find: two regimes
1. find cut-points of segment sets: S1, S2
2. estimate parameters of two regimes:
Q = {q1, q2 , D}
X
q1
q2
SIGMOD 2014
Y. Matsubara et al.
38
2.RegimeSplit
Inner loop
Details
Two-phase iterative approach
- Phase 1: (CutPointSearch)
- Split segments into two groups : S1, S2
- Phase 2: (BaumWelch)
- Update model parameters: Q = {q1, q2 , D}
S1 = {s2, s4 }
S2 = {s1, s3 }
X
q1
q2
SIGMOD 2014
{q1,q2 , D}
Y. Matsubara et al.
Phase 1
Phase 2
39
3.AutoPlait
Given:
- bundle
X
Outer loop
X
Find: r regimes (r=2, 3, 4, … )
- i.e., find full parameter set
C = {m, r, S, Q, F}
SIGMOD 2014
Y. Matsubara et al.
40
3.AutoPlait
Outer loop
Split regimes r=2,3,…, as long as cost keeps decreasing
- Find appropriate # of regimes
r = min CostT (X;m, r, S,Q, F)
r
SIGMOD 2014
Y. Matsubara et al.
41
3.AutoPlait
Outer loop
Split regimes r=2,3,…, as long as cost keeps decreasing
- Find appropriate # of regimes
r=2, m=4
r = min CostT (X;m, r, S,Q, F)
r
f1 = 2
f4 = 1
X
f1 = 2 f2 = 4 f3 = 3 f4 =1 f5 = 2 f6 = 4 f7 = 3 f8 =1
r=3, m=6
X
SIGMOD 2014
f3 = 2
q1
q2
r=4, m=8
q1
q3
q4
q2
f2 = 1
f1 = 2 f2 = 3
f3 = 1 f4 = 2 f5 = 3
f6 =1
X
q1
q3
q2
Y. Matsubara et al.
42
Outline
-
Motivation
Problem definition
Compression & summarization
Algorithms
Experiments
AutoPlait at work
Conclusions
SIGMOD 2014
Y. Matsubara et al.
43
Experiments
We answer the following questions…
Q1. Sense-making
Can it help us understand the given bundles?
Q2. Accuracy
How well does it find cut-points and regimes?
Q3. Scalability
How does it scale in terms of computational time?
SIGMOD 2014
Y. Matsubara et al.
44
Q1. Sense-making
MoCap data
AutoPlait
(NO magic
numbers)
DynaMMo (Li et al., KDD’09)
SIGMOD 2014
pHMM (Wang et al., SIGMOD’11)
Y. Matsubara et al.
45
Q1. Sense-making
MoCap data
AutoPlait (NO magic numbers)
SIGMOD 2014
Y. Matsubara et al.
46
Q2. Accuracy
(a) Segmentation
(b) Clustering
Target
AutoPlait needs “no magic numbers”
SIGMOD 2014
Y. Matsubara et al.
47
Q3. Scalability
Wall clock time vs. data size (length) : n
AutoPlait scales linearly, i.e., O(n)
pHMM
DynaMMo
AutoPlait
SIGMOD 2014
Y. Matsubara et al.
48
Outline
-
Motivation
Problem definition
Compression & summarization
Algorithms
Experiments
AutoPlait at work
Conclusions
SIGMOD 2014
Y. Matsubara et al.
49
AutoPlait at work
AutoPlait is capable of various applications,
e.g.,
App1. Model analysis
- Web-click sequences
App2. Event discovery
- Google Trend data
SIGMOD 2014
Y. Matsubara et al.
50
AutoPlait at work
AutoPlait is capable of various applications,
e.g.,
App1. Model analysis
- Web-click sequences
App2. Event discovery
- Google Trend data
SIGMOD 2014
Y. Matsubara et al.
51
App1. Model analysis (WebClick)
Web-click sequences (1 month, 5urls)
Time (per 10 minutes)
– 5urls: blog, news, dictionary, Q&A, mail
– every 10 minutes
SIGMOD 2014
Y. Matsubara et al.
52
App1. Model analysis (WebClick)
Web-click sequences (1 month, 5urls)
AutoPlait finds 2 patterns: weekday/weekend !
SIGMOD 2014
Y. Matsubara et al.
53
App1. Model analysis (WebClick)
Web-click sequences (1 month, 5urls)
AutoPlait finds 2 patterns: weekday/weekend
Monday!
(but holiday)
SIGMOD 2014
Y. Matsubara et al.
54
App1. Model analysis (WebClick)
Pattern of weekday regime
Details
Observation: Working hard every weekday
(i.e., using dictionary, news sites)
SIGMOD 2014
Y. Matsubara et al.
55
App1. Model analysis (WebClick)
Pattern of weekend regime
Details
Observation: No more work on weekend
(i.e., blog, mail, Q&A for non-business purposes)
SIGMOD 2014
Y. Matsubara et al.
56
AutoPlait at work
AutoPlait is capable of various applications,
e.g.,
App1. Model analysis
- Web-click sequences
App2. Event discovery
- Google Trend data
SIGMOD 2014
Y. Matsubara et al.
57
App2. Event discovery (GoogleTrend)
Anomaly detection (flu-related topics,10 years)
SIGMOD 2014
Y. Matsubara et al.
58
App2. Event discovery (GoogleTrend)
Anomaly detection (flu-related topics,10 years)
AutoPlait detects 1 unusual spike in 2009
(i.e., swine flu)
SIGMOD 2014
Y. Matsubara et al.
59
App2. Event discovery (GoogleTrend)
Turning point detection (seasonal sweets
topics)
SIGMOD 2014
Y. Matsubara et al.
60
App2. Event discovery (GoogleTrend)
Turning point detection (seasonal sweets
topics)
Trend suddenly changed in 2010 (release of
android OS “Ginger bread”, “Ice Cream Sandwich”)
SIGMOD 2014
Y. Matsubara et al.
61
App2. Event discovery (GoogleTrend)
Trend discovery (game-related topics)
SIGMOD 2014
Y. Matsubara et al.
62
App2. Event discovery (GoogleTrend)
Trend discovery (game-related topics)
It discovers 3 phases of “game console war”
(Xbox&PlayStation/Wii/Mobile social games)
SIGMOD 2014
Y. Matsubara et al.
63
Outline
-
Motivation
Problem definition
Compression & summarization
Algorithms
Experiments
AutoPlait at work
Conclusions
SIGMOD 2014
Y. Matsubara et al.
64
Conclusions
AutoPlait has the following properties
• Effective ✔
Find optimal segments/regimes
• Sense-making ✔
Reasonable regimes
• Fully-automatic ✔
No magic numbers
• Scalable ✔
It scales linearly
SIGMOD 2014
Y. Matsubara et al.
65
Thank you!
Code: http://www.cs.kumamoto-u.ac.jp/~yasuko/software.html
Mail: yasuko@cs.kumamoto-u.ac.jp
SIGMOD 2014
Y. Matsubara et al.
66
Download