Mining_Recent_Temporal_Patterns_for_Event_Detection_in

advertisement
Mining Recent Temporal Patterns for Event
Detection in Multivariate Time Series Data
Iyad Batal
Dmitriy Fradkin
James Harrison
Fabian Moerchen
Milos Hauskrecht
Dept. of Computer Science
Siemens Corporate Research
Dept. of Public Health
Siemens Corporate Research
Dept. of Computer Science
University of Pittsburgh
dmitriy.fradkin@siem
ens.com
Sciences
fabian.moerchen@si
emens.com
University of Pittsburgh
iyad@cs.pitt.edu
University of Virginia
milos@cs.pitt.edu
james.harrison@virgi
nia.edu
18th ACM SIGKDD international conference on Knowledge discovery and data mining, 2012
Introduction
• Supervised temporal detection.
• Given a labeled dataset of temporal instances till time ti.
• Find frequently occurring “temporal patterns” for each label.
• Given a sample instance, predict its label.
• Contributions of this paper:
• Abstractions to define “Recent temporal Patterns” (motivated from medical
EHR records)
• Algorithms to find “frequent patterns” among a given database.
Example
• Database: EHR Records of Patients
• Each Record:
• Multiple temporal variables. Each with multiple reading till time ti
• E.g. glucose, ceratine, cholesterol
• Label: Disease/Symptom detected at time ti
• Supervised Learning: Given a database, learn patterns associated
with different diseases
• Prediction: Given a new Patient, find “recent temporal pattern” and
the label associated with it.
Temporal Abstraction Patterns
• Issues:
• Irregularly sampled
• Sampling errors
• Multivariate
• Temporal Abstractions
• Numeric Values to Finite Abstraction Alphabets
• E.g. Very Low, low, normal, high, very high
• All contiguous values with same abstraction form an interval
• Time series can be represented as
• {<v1,s1,e1>,<v2,s2,e2>…}
Temporal Abstraction Patterns
• Multivariate State Sequences
• Temporal Variable - F
• State – V
• State interval – E = (F, V, s, e), so a single variable time series is a ordered set
of Ei
• Multivariate State Sequence – (basically a patient record)
• Zi
• An ordered combination of state intervals for all containing variables
• Ordered by Start Times.
• If Start Times collide, sort by End Time
• If Both collide, sort of lexical ordering
Temporal Abstraction Patterns
• What is a pattern?
• A sequence of “temporal relations” between state intervals Ei, Ej
• What kind of “temporal relations”
• Ei occurs BEFORE Ej (or vice versa)
• Ei CO-OCCURS with Ej
• There are other fine grained relations such as start together, end together,
equals, contains, overlaps, meets etc. But they only consider the above two
relations which generalizes all these relations.
Temporal Abstraction Patterns
• Temporal Pattern:
• P = (<S1,S2,…,Sk>, R), where Si is the state. And R is the relation matrix which
defines either a b (before) or c(co-occurs) relation between a state and the
consequent states.
• Hence R is a upper triangular matrix.
• P is called k-pattern where k = |<Si,…,Sk>|
Temporal Abstraction Patterns
• Pattern Containment:
• Given a pattern P = (<S1,S2,…,Sk>, R)
• And MSS Z = <E1,…,El>
• Z contains P iff
• All Si are in Z
• And for i= 1..k and j = i..k-1 Ri,j holds for Ei, Ej (denoted by Ri,j (Ei, Ej))
Mining Recent Temporal Patterns
• “Recent State”
• Given a MSS Z=<E1,…,Ek>
• A state Ei is “recent” interval given a maximum gap g if any of the following condition is
true
• Ei is the last state for the given temporal variable.
• Ei.F not equal Ek.F for all k > I
• Z.end – Ei.end <= g
Mining Recent Temporal Patterns
• “Recent Pattern (RTP)”
• Given a MSS Z=<E1,…,Ek>
• A pattern P = (<S1,..Sk>, R) is “recent” pattern in Z given a maximum gap g if ALL of the
following conditions are true
• Z contains P
• Sk is a recent state in Z
• No two consecutive matched states in Z are more than “g” apart. i.e. E(k+1).s – E(k).e <= g
• Suffix Sub-pattern:
• P is suffix sub-pattern of P’ if
• P contains a suffix of states in P’ (e.g. if P’ = <S1,S2,S3>, P can contain <S3> or <S2,S3> or
<S1, S2, S3> but NOT <S1,S3>.
• All the relations for Si in P are same as in P’
Mining Recent Temporal Patterns
• Frequent Recent Pattern
Given a database D of MSS, a gap parameter g, and a support
parameter sigma.
A pattern P is called “frequent” if the number of times it occurs in D,
called its “support” denoted as RTP-sup-g(P,D), is greater than sigma.
Mining Algorithm
• Goal: For a given database, for all given labels. Find Frequent Recent
Patterns associated with each given Label.
• In other words, for each class y, given the database Dy. Output a set
of patterns that satisfy:
Mining Algorithm
• Approach:
• Build Patterns of incremental size. Start with patterns of size 1 and build on
top of that.
• For (k+1)th stage, i.e. to fine (k+1)-RTPs given K-RTPs, the algorithms consists
of two stages
• Candidate Generation
• Counting (by removing candidates that do not qualify)
Mining Algorithm
• Naïve Candidate Generation
1
a
b
c
K-RTPs
2
3
…
L
1
1
1
a
b
c
2
2
2
a
b
c
.
.
.
a
b
c
L
L
L
Mining Algorithm
• Improving Efficiency
Remove “incoherent” patterns. i.e. patterns that are not allowed.
• if S1.F = Si.F and R1,i = c, then the pattern cannot be valid, since these states
will be combined into one.
• If R1,j = c then for any R1,i such that i<j R1,i CANNOT be “b”. If it is, then
pattern is incoherent.
• This actually leads to a nice corollary. There can only be consequent “c”s followed by
consequent “b”s
• Hence, the total combinations of R to try is not 2^k but just (k+1). This drastically reduces
the candidate space.
Mining Algorithm
• Naïve Counting Algorithm.
• For each variable y
• For each candidate P,
• For each MSS Z in database Dy
• Verify if P is a RTP in Z and increment Count for P for variable y.
Mining Algorithm
• Improving efficiency of Counting Algorithm.
• Filter D based only on States (instead of entire pattern matching)
• Proposition (based on the suffix sub-pattern definition):
• The list of Z in D containing P is a subset of the list of Z containing P’ if P’ is a suffix subpattern of P.
• Get the intersection of the above two list to get the actual candidate Zs to
search and match.
• Note: Due to the second property, the size of the list of candidate Zs keeps on
decreasing over time.
• At the end of the algorithm
• For each label y, we have a list of frequent m-RTP patterns associated with that label.
Learning the classifier for prediction
• For each instance in D, get the temporal abstraction (MSS) Zi.
• Mine frequent m-RTPs for each label. Combine all the RTPs into a set Omega.
• Create a feature vector f of size |Omega|
• For each MSS Zi, create a feature vector. Put 1 if that pattern is in Zi 0 otherwise.
• Use any of the existing classifiers (ANN, SVN etc.) for learning using the training
set.
Experimental Evaluation
• Dataset
• 13,558 records of diabetics patients
• 19 time series variables per patient (glucose, ceratin, hemoglobin, cholesterol
etc.)
• 602 ICD-9 diagnosis codes divided into 8 disease categories (8 class labels)
• Setup
• Separate experiments for each of the 8 labels.
• Each category is divided into cases (positives) and controls (negatives)
• Cases: patients with the target disease,
• All the time series variables recorded till the time the disease was FIRST diagnosed.
• Controls: All other patients.
• All time series variables recorded upto a randomly selected point in time.
Experimental Evaluation
• Classification Performance
• Different methods to compare
1.
2.
3.
4.
5.
Last Values – only consider most recent value for each variable.
TP – Consider all temporal patterns for each variable
TP_Sparse – Consider all temporal patterns, but select top 50 for each variable
RTP – Consider all Recent temporal Patterns for each variable
RTP_Sparse – top 50 RTPs for each variable
Experimental Evaluation
• Classification Performance
•
•
•
•
Sigma (support) (for 2-5) is set to 15%
Gap (for 4,5) is set to 6 months.
Test: 10-folds cross validations
Quality measurement:
accuracy = ( TP + TN ) / (P+N)
• Create features using each of the above method.
• Build SVN using these features
• Evaluate performance of SVN using the “classification accuracy” and “AUC” i.e. area
under the ROC (Receiver operating characteristic) curve.
• AUC is equal to the probability that a classifier will rank a randomly chosen positive instance
higher than a randomly chosen negative one
Experimental Evaluation
Experimental Evaluation
• Knowledge Discovery
Experimental Evaluation
Conclusions
• “Recent Temporal Patterns” are of special interest, especially in medical domain, but
should have similar behavior in other domains.
• Time series abstractions provide a simple approximation as well as compression of
data.
• The gap parameter in detecting pattern is critical for scaling up the mining process
(but is domain dependent).
• RTPs provide efficient mining as well as better prediction accuracy as compared to
detecting patterns over the entire series (validated here in the medical domain).
• How can we leverage/extend this?
• Towards defining high level abstractions for time series kernels
• Extend from “independent” multivariate to interdependent multivariate model,
where different vertices form variables and the edges define the dependencies.
Download