AutoPlait: Automatic Mining of Co-evolving Time Sequences Yasuko Matsubara (Kumamoto University) Yasushi Sakurai (Kumamoto University) Christos Faloutsos (CMU) SIGMOD 2014 Y. Matsubara et al. 1 Motivation Given: co-evolving time-series – e.g., MoCap (leg/arm sensors) “Chicken dance” Time SIGMOD 2014 Y. Matsubara et al. 2 Motivation Given: co-evolving time-series – e.g., MoCap (leg/arm sensors) Q. Any distinct “Chicken dance” patterns? Q. If yes, how many? Time SIGMOD 2014 Y. Matsubara et al. Q. What kind? 3 Motivation Challenges: co-evolving sequences – Unknown # of patterns (e.g., beaks) – Different durations … Time SIGMOD 2014 Y. Matsubara et al. 4 Motivation Challenges: co-evolving sequences – Unknown # of patterns (e.g., beaks) Q. Can we summarize it automatically ? – Different durations … Time SIGMOD 2014 Y. Matsubara et al. 5 Motivation Goal: find patterns that agree with human intuition Time SIGMOD 2014 Y. Matsubara et al. 6 Motivation Goal: find patterns that agree with human intuition AutoPlait: “fully-automatic” mining algorithm SIGMOD 2014 Y. Matsubara et al. 7 Importance of “fully-automatic” No magic numbers! ... because, Manual - sensitive to the parameter tuning - long tuning steps (hours, days, …) Automatic (no magic numbers) - no expert tuning required Big data mining: -> we cannot afford human intervention!! SIGMOD 2014 Y. Matsubara et al. 8 Outline - Motivation Problem definition Compression & summarization Algorithms Experiments AutoPlait at work Conclusions SIGMOD 2014 Y. Matsubara et al. 9 Problem definition Key concepts - Bundle: X Segment: S Regime: Q Segment-membership: F SIGMOD 2014 Y. Matsubara et al. given hidden hidden hidden 10 Problem definition • Bundle : set of d co-evolving sequences given X = {x1,..., xn } d´n Bundle X (d=4) SIGMOD 2014 Y. Matsubara et al. 11 Problem definition • Segment: convert X -> m segments, S hidden Segment (m=8) SIGMOD 2014 S = {s1,..., sm } s1 s2 s3 s4 Y. Matsubara et al. s5 s6 s7 s8 12 Problem definition • Regime: segment groups: Q = {q1, q2 ,...qr , Dr´r } hidden Regimes (r=4) beaks wings SIGMOD 2014 q r : model of regime r q1 q2 q3 q4 Y. Matsubara et al. 13 Problem definition • Segment-membership: assignment F = { f1,..., fm } hidden F={ 2, 4, 1, 3, 2, 4, 1, 3 } Segmentmembership (m=8) SIGMOD 2014 Y. Matsubara et al. 14 Problem definition • Given: bundle X X = {x1,..., xn } SIGMOD 2014 Y. Matsubara et al. 15 Problem definition • Given: bundle X X = {x1,..., xn } • Find: compact description C of X C = {m, r, S, Q, F} SIGMOD 2014 Y. Matsubara et al. 16 Problem definition • Given: bundle X X = {x1,..., xn } • Find: compact description C of X C = {m, r, S, Q, F} m segments r regimes SIGMOD 2014 Segment-membership Y. Matsubara et al. 17 Outline - Motivation Problem definition Compression & summarization Algorithms Experiments AutoPlait at work Conclusions SIGMOD 2014 Y. Matsubara et al. 18 Main ideas Goal: compact description of X C = {m, r, S, Q, F} without user intervention!! Challenges: Q1. How to generate ‘informative’ regimes ? Q2. How to decide # of regimes/segments ? SIGMOD 2014 Y. Matsubara et al. 19 Main ideas Goal: compact description of X C = {m, r, S, Q, F} without user intervention!! Challenges: Q1. How to generate ‘informative’ regimes ? Idea (1): Multi-level chain model Q2. How to decide # of regimes/segments ? Idea (2): Model description cost SIGMOD 2014 Y. Matsubara et al. 20 Idea (1): MLCM: multi-level chain model Q1. How to generate ‘informative’ regimes ? Model beaks claps wings Sequences SIGMOD 2014 Regimes Y. Matsubara et al. 21 Idea (1): MLCM: multi-level chain model Q1. How to generate ‘informative’ regimes ? Model beaks claps wings Sequences Regimes Idea (1): Multi-level chain model –HMM-based probabilistic model –with “across-regime” transitions SIGMOD 2014 Y. Matsubara et al. 22 Idea (1): MLCM: multi-level chain model Q = {q1, q2,...qr , Dr´r } (qi = {p , A, B}) r regimes across-regime (HMMs) transition prob. SIGMOD 2014 Y. Matsubara et al. Details Single HMM parameters 23 Idea (1): MLCM: multi-level chain model (qi = {p , A, B}) Q = {q1, q2,...qr , Dr´r } r regimes across-regime (HMMs) transition prob. Q d11 a1;11 a1;31 q1 a1;12 state 1 a1;33 state 2 state 3 a1;23 a1:22 Regime1 “beaks” SIGMOD 2014 d12 a2;11 d21 Single HMM parameters d22 Regime switch state 1 a2;12 a2;21 q2 a2;22 state 2 Regime2 “wings” Y. Matsubara et al. Details Regimes r=2 Regime 1 (k=3) Regime 2 (k=2) 24 Idea (2): model description cost Q2. How to decide # of regimes/segments ? Idea (2): Model description cost • Minimize coding cost • find “optimal” # of segments/regimes SIGMOD 2014 Y. Matsubara et al. 25 Idea (2): model description cost Idea: Minimize encoding cost! CostM CostC CostT min ( CostM(M) + Costc(X|M) ) Coding cost Model cost 1 2 3 4 5 6 7 8 9 10 (# of r, m) Good compression SIGMOD 2014 Y. Matsubara et al. Good description 26 Idea (2): model description cost Total cost of bundle X, given C Details C = {m, r, S,Q, F} SIGMOD 2014 Y. Matsubara et al. 27 Idea (2): model description cost Total cost of bundle X, given C duration/dimensi ons segment lengths SIGMOD 2014 Details C = {m, r, S,Q, F} # of segmentsegments/regim membership F es Model description cost of Θ et al. Y. Matsubara Coding cost of X given Θ 28 Outline - Motivation Problem definition Compression & summarization Algorithms Experiments AutoPlait at work Conclusions SIGMOD 2014 Y. Matsubara et al. 29 AutoPlait Overview Iteration 0 r=1, m=1 X Start! r=1 0 1 2 3 4 5 6 7 Iteration Iteration SIGMOD 2014 Y. Matsubara et al. 30 AutoPlait Overview f1 = 2 f2 = 1 f3 = 2 f4 = 1 X Iteration 1 r=2, m=4 q1 q2 r=1 Split r=2 0 1 2 3 4 5 6 7 Iteration SIGMOD 2014 Y. Matsubara et al. 31 AutoPlait Overview f1 = 2 f2 = 1 f3 = 2 f4 = 1 X Iteration 1 r=2, m=4 q1 q2 f1 = 2 f2 = 3 f3 = 1 f4 = 2 f5 = 3 f6 = 1 X r=1 Split Iteration 2 q r=3, m=6 q13 q2 r=2 r=3 0 1 2 3 4 5 6 7 Iteration SIGMOD 2014 Y. Matsubara et al. 32 AutoPlait f1 = 2 Overview f2 = 1 f3 = 2 f4 = 1 X Iteration 1 r=2, m=4 q1 q2 f1 = 2 f2 = 3 f3 = 1 f4 = 2 f5 = 3 f6 = 1 X Iteration 2 q r=3, m=6 q13 r=1 Split r=2 r=4 r=3 0 1 2 3 4 5 6 7 Iteration Iteration SIGMOD 2014 q2 f1 = 2 f2 = 4 f3 = 3 f4 =1 f5 = 2 f6 = 4 f7 = 3 f8 =1 X Iteration 4 q1 r=4, m=8 q 3 q4 q2 Y. Matsubara et al. 33 AutoPlait Algorithms 1. CutPointSearch Inner-most loop Find good cut-points/segments 2. RegimeSplit Inner loop Estimate good regime parameters Θ 3. AutoPlait Outer loop Search for the best number of regimes (r=2,3,4…) SIGMOD 2014 Y. Matsubara et al. 34 1.CutPointSearch Inner-most loop Given: - bundle X - parameters of two regimes Q = {q1, q2 , D} Find: cut-points of segment sets S1, S2, {S1, S2 } = argmax P(X | S1, S2, Q) S1,S2 X X {q1,q2 , D} q1 q2 SIGMOD 2014 S1 = {s2, s4 } S2 = {s1, s3 } Y. Matsubara et al. 35 1.CutPointSearch Inner-most loop Details q1 DP algorithm to compute likelihood: P(X | Q) L1;1 (3) = f state 2 ... state 3 ... switch?? L1;3 (2) = f switch?? d12 q2 state 1 d12 L2;1 (3) = {3} L (4) = {3} L2;1 (5) = {4} 2;1 ... ... state 2 X SIGMOD 2014 ... state 1 L1;1 (1) = f L2;2 (4) = {4} L2;2 (5) = {3} L2;2 (6) = {4} t =1 t=2 t =3 Y. Matsubara et al. t=4 t=5 t=6 ... 36 1.CutPointSearch Inner-most loop Details Theoretical analysis Scalability 2 - It takes O(ndk ) time (only single scan) - n: length of X - d: dimension of X - k: # of hidden states in regime Accuracy It guarantees the optimal cut points - (Details in paper) SIGMOD 2014 Y. Matsubara et al. 37 2.RegimeSplit Given: - bundle X Inner loop X Find: two regimes 1. find cut-points of segment sets: S1, S2 2. estimate parameters of two regimes: Q = {q1, q2 , D} X q1 q2 SIGMOD 2014 Y. Matsubara et al. 38 2.RegimeSplit Inner loop Details Two-phase iterative approach - Phase 1: (CutPointSearch) - Split segments into two groups : S1, S2 - Phase 2: (BaumWelch) - Update model parameters: Q = {q1, q2 , D} S1 = {s2, s4 } S2 = {s1, s3 } X q1 q2 SIGMOD 2014 {q1,q2 , D} Y. Matsubara et al. Phase 1 Phase 2 39 3.AutoPlait Given: - bundle X Outer loop X Find: r regimes (r=2, 3, 4, … ) - i.e., find full parameter set C = {m, r, S, Q, F} SIGMOD 2014 Y. Matsubara et al. 40 3.AutoPlait Outer loop Split regimes r=2,3,…, as long as cost keeps decreasing - Find appropriate # of regimes r = min CostT (X;m, r, S,Q, F) r SIGMOD 2014 Y. Matsubara et al. 41 3.AutoPlait Outer loop Split regimes r=2,3,…, as long as cost keeps decreasing - Find appropriate # of regimes r=2, m=4 r = min CostT (X;m, r, S,Q, F) r f1 = 2 f4 = 1 X f1 = 2 f2 = 4 f3 = 3 f4 =1 f5 = 2 f6 = 4 f7 = 3 f8 =1 r=3, m=6 X SIGMOD 2014 f3 = 2 q1 q2 r=4, m=8 q1 q3 q4 q2 f2 = 1 f1 = 2 f2 = 3 f3 = 1 f4 = 2 f5 = 3 f6 =1 X q1 q3 q2 Y. Matsubara et al. 42 Outline - Motivation Problem definition Compression & summarization Algorithms Experiments AutoPlait at work Conclusions SIGMOD 2014 Y. Matsubara et al. 43 Experiments We answer the following questions… Q1. Sense-making Can it help us understand the given bundles? Q2. Accuracy How well does it find cut-points and regimes? Q3. Scalability How does it scale in terms of computational time? SIGMOD 2014 Y. Matsubara et al. 44 Q1. Sense-making MoCap data AutoPlait (NO magic numbers) DynaMMo (Li et al., KDD’09) SIGMOD 2014 pHMM (Wang et al., SIGMOD’11) Y. Matsubara et al. 45 Q1. Sense-making MoCap data AutoPlait (NO magic numbers) SIGMOD 2014 Y. Matsubara et al. 46 Q2. Accuracy (a) Segmentation (b) Clustering Target AutoPlait needs “no magic numbers” SIGMOD 2014 Y. Matsubara et al. 47 Q3. Scalability Wall clock time vs. data size (length) : n AutoPlait scales linearly, i.e., O(n) pHMM DynaMMo AutoPlait SIGMOD 2014 Y. Matsubara et al. 48 Outline - Motivation Problem definition Compression & summarization Algorithms Experiments AutoPlait at work Conclusions SIGMOD 2014 Y. Matsubara et al. 49 AutoPlait at work AutoPlait is capable of various applications, e.g., App1. Model analysis - Web-click sequences App2. Event discovery - Google Trend data SIGMOD 2014 Y. Matsubara et al. 50 AutoPlait at work AutoPlait is capable of various applications, e.g., App1. Model analysis - Web-click sequences App2. Event discovery - Google Trend data SIGMOD 2014 Y. Matsubara et al. 51 App1. Model analysis (WebClick) Web-click sequences (1 month, 5urls) Time (per 10 minutes) – 5urls: blog, news, dictionary, Q&A, mail – every 10 minutes SIGMOD 2014 Y. Matsubara et al. 52 App1. Model analysis (WebClick) Web-click sequences (1 month, 5urls) AutoPlait finds 2 patterns: weekday/weekend ! SIGMOD 2014 Y. Matsubara et al. 53 App1. Model analysis (WebClick) Web-click sequences (1 month, 5urls) AutoPlait finds 2 patterns: weekday/weekend Monday! (but holiday) SIGMOD 2014 Y. Matsubara et al. 54 App1. Model analysis (WebClick) Pattern of weekday regime Details Observation: Working hard every weekday (i.e., using dictionary, news sites) SIGMOD 2014 Y. Matsubara et al. 55 App1. Model analysis (WebClick) Pattern of weekend regime Details Observation: No more work on weekend (i.e., blog, mail, Q&A for non-business purposes) SIGMOD 2014 Y. Matsubara et al. 56 AutoPlait at work AutoPlait is capable of various applications, e.g., App1. Model analysis - Web-click sequences App2. Event discovery - Google Trend data SIGMOD 2014 Y. Matsubara et al. 57 App2. Event discovery (GoogleTrend) Anomaly detection (flu-related topics,10 years) SIGMOD 2014 Y. Matsubara et al. 58 App2. Event discovery (GoogleTrend) Anomaly detection (flu-related topics,10 years) AutoPlait detects 1 unusual spike in 2009 (i.e., swine flu) SIGMOD 2014 Y. Matsubara et al. 59 App2. Event discovery (GoogleTrend) Turning point detection (seasonal sweets topics) SIGMOD 2014 Y. Matsubara et al. 60 App2. Event discovery (GoogleTrend) Turning point detection (seasonal sweets topics) Trend suddenly changed in 2010 (release of android OS “Ginger bread”, “Ice Cream Sandwich”) SIGMOD 2014 Y. Matsubara et al. 61 App2. Event discovery (GoogleTrend) Trend discovery (game-related topics) SIGMOD 2014 Y. Matsubara et al. 62 App2. Event discovery (GoogleTrend) Trend discovery (game-related topics) It discovers 3 phases of “game console war” (Xbox&PlayStation/Wii/Mobile social games) SIGMOD 2014 Y. Matsubara et al. 63 Outline - Motivation Problem definition Compression & summarization Algorithms Experiments AutoPlait at work Conclusions SIGMOD 2014 Y. Matsubara et al. 64 Conclusions AutoPlait has the following properties • Effective ✔ Find optimal segments/regimes • Sense-making ✔ Reasonable regimes • Fully-automatic ✔ No magic numbers • Scalable ✔ It scales linearly SIGMOD 2014 Y. Matsubara et al. 65 Thank you! Code: http://www.cs.kumamoto-u.ac.jp/~yasuko/software.html Mail: yasuko@cs.kumamoto-u.ac.jp SIGMOD 2014 Y. Matsubara et al. 66