Mining Recent Temporal Patterns for Event Detection in Multivariate Time Series Data Iyad Batal Dmitriy Fradkin James Harrison Fabian Moerchen Milos Hauskrecht Dept. of Computer Science Siemens Corporate Research Dept. of Public Health Siemens Corporate Research Dept. of Computer Science University of Pittsburgh dmitriy.fradkin@siem ens.com Sciences fabian.moerchen@si emens.com University of Pittsburgh iyad@cs.pitt.edu University of Virginia milos@cs.pitt.edu james.harrison@virgi nia.edu 18th ACM SIGKDD international conference on Knowledge discovery and data mining, 2012 Introduction • Supervised temporal detection. • Given a labeled dataset of temporal instances till time ti. • Find frequently occurring “temporal patterns” for each label. • Given a sample instance, predict its label. • Contributions of this paper: • Abstractions to define “Recent temporal Patterns” (motivated from medical EHR records) • Algorithms to find “frequent patterns” among a given database. Example • Database: EHR Records of Patients • Each Record: • Multiple temporal variables. Each with multiple reading till time ti • E.g. glucose, ceratine, cholesterol • Label: Disease/Symptom detected at time ti • Supervised Learning: Given a database, learn patterns associated with different diseases • Prediction: Given a new Patient, find “recent temporal pattern” and the label associated with it. Temporal Abstraction Patterns • Issues: • Irregularly sampled • Sampling errors • Multivariate • Temporal Abstractions • Numeric Values to Finite Abstraction Alphabets • E.g. Very Low, low, normal, high, very high • All contiguous values with same abstraction form an interval • Time series can be represented as • {<v1,s1,e1>,<v2,s2,e2>…} Temporal Abstraction Patterns • Multivariate State Sequences • Temporal Variable - F • State – V • State interval – E = (F, V, s, e), so a single variable time series is a ordered set of Ei • Multivariate State Sequence – (basically a patient record) • Zi • An ordered combination of state intervals for all containing variables • Ordered by Start Times. • If Start Times collide, sort by End Time • If Both collide, sort of lexical ordering Temporal Abstraction Patterns • What is a pattern? • A sequence of “temporal relations” between state intervals Ei, Ej • What kind of “temporal relations” • Ei occurs BEFORE Ej (or vice versa) • Ei CO-OCCURS with Ej • There are other fine grained relations such as start together, end together, equals, contains, overlaps, meets etc. But they only consider the above two relations which generalizes all these relations. Temporal Abstraction Patterns • Temporal Pattern: • P = (<S1,S2,…,Sk>, R), where Si is the state. And R is the relation matrix which defines either a b (before) or c(co-occurs) relation between a state and the consequent states. • Hence R is a upper triangular matrix. • P is called k-pattern where k = |<Si,…,Sk>| Temporal Abstraction Patterns • Pattern Containment: • Given a pattern P = (<S1,S2,…,Sk>, R) • And MSS Z = <E1,…,El> • Z contains P iff • All Si are in Z • And for i= 1..k and j = i..k-1 Ri,j holds for Ei, Ej (denoted by Ri,j (Ei, Ej)) Mining Recent Temporal Patterns • “Recent State” • Given a MSS Z=<E1,…,Ek> • A state Ei is “recent” interval given a maximum gap g if any of the following condition is true • Ei is the last state for the given temporal variable. • Ei.F not equal Ek.F for all k > I • Z.end – Ei.end <= g Mining Recent Temporal Patterns • “Recent Pattern (RTP)” • Given a MSS Z=<E1,…,Ek> • A pattern P = (<S1,..Sk>, R) is “recent” pattern in Z given a maximum gap g if ALL of the following conditions are true • Z contains P • Sk is a recent state in Z • No two consecutive matched states in Z are more than “g” apart. i.e. E(k+1).s – E(k).e <= g • Suffix Sub-pattern: • P is suffix sub-pattern of P’ if • P contains a suffix of states in P’ (e.g. if P’ = <S1,S2,S3>, P can contain <S3> or <S2,S3> or <S1, S2, S3> but NOT <S1,S3>. • All the relations for Si in P are same as in P’ Mining Recent Temporal Patterns • Frequent Recent Pattern Given a database D of MSS, a gap parameter g, and a support parameter sigma. A pattern P is called “frequent” if the number of times it occurs in D, called its “support” denoted as RTP-sup-g(P,D), is greater than sigma. Mining Algorithm • Goal: For a given database, for all given labels. Find Frequent Recent Patterns associated with each given Label. • In other words, for each class y, given the database Dy. Output a set of patterns that satisfy: Mining Algorithm • Approach: • Build Patterns of incremental size. Start with patterns of size 1 and build on top of that. • For (k+1)th stage, i.e. to fine (k+1)-RTPs given K-RTPs, the algorithms consists of two stages • Candidate Generation • Counting (by removing candidates that do not qualify) Mining Algorithm • Naïve Candidate Generation 1 a b c K-RTPs 2 3 … L 1 1 1 a b c 2 2 2 a b c . . . a b c L L L Mining Algorithm • Improving Efficiency Remove “incoherent” patterns. i.e. patterns that are not allowed. • if S1.F = Si.F and R1,i = c, then the pattern cannot be valid, since these states will be combined into one. • If R1,j = c then for any R1,i such that i<j R1,i CANNOT be “b”. If it is, then pattern is incoherent. • This actually leads to a nice corollary. There can only be consequent “c”s followed by consequent “b”s • Hence, the total combinations of R to try is not 2^k but just (k+1). This drastically reduces the candidate space. Mining Algorithm • Naïve Counting Algorithm. • For each variable y • For each candidate P, • For each MSS Z in database Dy • Verify if P is a RTP in Z and increment Count for P for variable y. Mining Algorithm • Improving efficiency of Counting Algorithm. • Filter D based only on States (instead of entire pattern matching) • Proposition (based on the suffix sub-pattern definition): • The list of Z in D containing P is a subset of the list of Z containing P’ if P’ is a suffix subpattern of P. • Get the intersection of the above two list to get the actual candidate Zs to search and match. • Note: Due to the second property, the size of the list of candidate Zs keeps on decreasing over time. • At the end of the algorithm • For each label y, we have a list of frequent m-RTP patterns associated with that label. Learning the classifier for prediction • For each instance in D, get the temporal abstraction (MSS) Zi. • Mine frequent m-RTPs for each label. Combine all the RTPs into a set Omega. • Create a feature vector f of size |Omega| • For each MSS Zi, create a feature vector. Put 1 if that pattern is in Zi 0 otherwise. • Use any of the existing classifiers (ANN, SVN etc.) for learning using the training set. Experimental Evaluation • Dataset • 13,558 records of diabetics patients • 19 time series variables per patient (glucose, ceratin, hemoglobin, cholesterol etc.) • 602 ICD-9 diagnosis codes divided into 8 disease categories (8 class labels) • Setup • Separate experiments for each of the 8 labels. • Each category is divided into cases (positives) and controls (negatives) • Cases: patients with the target disease, • All the time series variables recorded till the time the disease was FIRST diagnosed. • Controls: All other patients. • All time series variables recorded upto a randomly selected point in time. Experimental Evaluation • Classification Performance • Different methods to compare 1. 2. 3. 4. 5. Last Values – only consider most recent value for each variable. TP – Consider all temporal patterns for each variable TP_Sparse – Consider all temporal patterns, but select top 50 for each variable RTP – Consider all Recent temporal Patterns for each variable RTP_Sparse – top 50 RTPs for each variable Experimental Evaluation • Classification Performance • • • • Sigma (support) (for 2-5) is set to 15% Gap (for 4,5) is set to 6 months. Test: 10-folds cross validations Quality measurement: accuracy = ( TP + TN ) / (P+N) • Create features using each of the above method. • Build SVN using these features • Evaluate performance of SVN using the “classification accuracy” and “AUC” i.e. area under the ROC (Receiver operating characteristic) curve. • AUC is equal to the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one Experimental Evaluation Experimental Evaluation • Knowledge Discovery Experimental Evaluation Conclusions • “Recent Temporal Patterns” are of special interest, especially in medical domain, but should have similar behavior in other domains. • Time series abstractions provide a simple approximation as well as compression of data. • The gap parameter in detecting pattern is critical for scaling up the mining process (but is domain dependent). • RTPs provide efficient mining as well as better prediction accuracy as compared to detecting patterns over the entire series (validated here in the medical domain). • How can we leverage/extend this? • Towards defining high level abstractions for time series kernels • Extend from “independent” multivariate to interdependent multivariate model, where different vertices form variables and the edges define the dependencies.