MAGIC: A Multi-Activity Graph Index for Activity Detection Massimiliano Albanese1 Andrea Pugliese2 V.S. Subrahmanian1 Octavian Udrea1 1 University of Maryland Institute for Advanced Computer Studies, College Park, Maryland, USA 2 University of Calabria - DEIS department, Rende, Italy IRI 2007 1 Introduction Many applications require to monitor large volumes of observation data for the occurrence of certain activities E.g. web servers maintain large server logs Early detecting what action a user is trying to perform may allow to prefetch or customize data Activity detection is a non trivial task Real world activities tend to be high level and can often be executed in many different ways Observations may be the result of interleaved activities IRI 2007 2 Key contributions The main contributions of this work are The definition of a Multi-Activity Graph Index (MAGIC), which can index very large numbers of observations from interleaved activities Algorithms to build such index Algorithms to answer two types of queries Evidence problem: find all sequences of observations that validate the occurrence of an activity with a minimum probability threshold Identification problem: identify the most probable activity occurring in an observation sequence IRI 2007 3 Stochastic Activity A Stochastic Activity is a labeled graph (V,E,δ) where V is a finite set of action E is a subset of (V×V) symbols vV s.t. ∄v'V s.t. (v',v)E, i.e., there exists at least one start node in V vV s.t. ∄v'V s.t. (v,v')E, i.e., there exists at least one end node in V δ :E[0,1] is a function that associates a probability distribution with the outgoing edges of each node vV Σ{v' V | (v,v') E} δ(v,v') = 1 IRI 2007 4 Example of Stochastic Activity Online purchase stochastic activity start node (V, E, δ) end node V = {catalog, itemDetails, cart, shippingMethod, paymentMethod, review, confirm} IRI 2007 5 Activity Instance and Occurrence Assumptions Each node in an activity is an observable event The probability of taking an action at any time only depends on the last action All observations are stored in a single relational database table An instance of a stochastic activity (V,E,δ) is a path (sequence of nodes) from a start node to an end node The probability of an activity instance is the product of the edge probabilities along the path An occurrence of a stochastic activity (V,E,δ) in an observation table O with probability p is a sequence of observations corresponding to the nodes of an activity instance The probability of an occurrence is the probability of the instance The span of an occurrence is the time interval including all the observations IRI 2007 6 Example of Activity Occurrence The “online purchase stochastic activity” occurs in the web server log shown in the table The sequence of observations with identifiers {1, 4, 7, 10, 13, 14} corresponds to the activity instance {catalog, cart, shippingMethod, paymentMethod, review, confirm} The span of this activity occurrence is [1,10] IRI 2007 id 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ts 1 2 2 3 4 5 5 6 7 7 8 9 9 10 11 11 11 action catalog catalog itemDetails cart itemDetails itemDetails shippingMethod cart shippingMethod paymentMethod itemDetails cart review confirm shippingMethod paymentMethod review 18 12 confirm 7 Complexity Given an observation table O and a stochastic activity A, the problem of finding all occurrences of A in O takes exponential time, w.r.t. the size of O It is not feasible to try to find all possible occurrences We propose restrictions on what constitutes a valid occurrence in order to greatly reduce the number of possible occurrences Due to the size of the search space, it is important to have a data structure that enables very fast searches for activity occurrences We propose the MAGIC index structure that allows to answer the Evidence and Identification problems efficiently monitor activity occurrences as new observations are collected IRI 2007 8 Minimal Span (MS) restriction If two occurrences O1 and O2 are found in the observation sequence and the span of O2 is contained within the span of O1, O1 is discarded from the result set The two sequences of observations with identifiers {1, 4, 7, 10, 13, 14} and {1, 4, 7, 10, 17, 18} respectively, are both activity occurrences corresponding to the instance {catalog, cart, shippingMethod, paymentMethod, review, confirm} The second one is discarded under this restriction IRI 2007 id 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ts 1 2 2 3 4 5 5 6 7 7 8 9 9 10 11 11 11 12 action catalog catalog itemDetails cart itemDetails itemDetails shippingMethod cart shippingMethod paymentMethod itemDetails cart review confirm shippingMethod paymentMethod review confirm id 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ts 1 2 2 3 4 5 5 6 7 7 8 9 9 10 11 11 11 action catalog catalog itemDetails cart itemDetails itemDetails shippingMethod cart shippingMethod paymentMethod itemDetails cart review confirm shippingMethod paymentMethod review 18 12 confirm 9 Earliest Action (EA) restriction When looking for the next action symbol in an activity occurrence, the first possible successor in the sequence is chosen. The two sequences of observations with identifiers {1, 4, 7, 10, 13, 14} and {1, 4, 9, 10, 13, 14} respectively, are both activity occurrences corresponding to the instance {catalog, cart, shippingMethod, paymentMethod, review, confirm} The second one is discarded under this restriction IRI 2007 id 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ts 1 2 2 3 4 5 5 6 7 7 8 9 9 10 11 11 11 12 action catalog catalog itemDetails cart itemDetails itemDetails shippingMethod cart shippingMethod paymentMethod itemDetails cart review confirm shippingMethod paymentMethod review confirm id 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ts 1 2 2 3 4 5 5 6 7 7 8 9 9 10 11 11 11 action catalog catalog itemDetails cart itemDetails itemDetails shippingMethod cart shippingMethod paymentMethod itemDetails cart review confirm shippingMethod paymentMethod review 18 12 confirm 10 Multi-Activity Graph In order to efficiently monitor observations for occurrences of multiple activities, we first merge all activity definitions from A = {A1,…, Ak} into a single graph A Multi-Activity Graph is a triple G = (VG, IA, δG) where VG=∪i=1,…,kVi is a set of action symbols IA={id(A1),…,id(Ak)} is a set of unique identifiers for activities in A δG: VG×VG×IA[0,1] is a function that associates a triple (v,v',id(Ai)) with δi(v,v'), if (v,v') Ei and 0 otherwise. IRI 2007 11 Example of Multi-Activity Graph b 0.4 a e 0.6 b d A1 a e A1(0.6) e A2 0.7 a A1 A1(0.4) A2(0.7) A1,A2 d A2(0.3) c c 0.3 d Merged graph A2 IRI 2007 12 Multi-Activity Graph Index Given a Multi-Activity Graph G = (VG, IA, δG) built over A = {A1,…, Ak}, a Multi-Activity Graph Index is a 6-tuple IG = (G,startG,endG,maxG,tablesG,completedG), where startG and endG are functions that associate each node vVG with the set of activities for which v is a start or end node respectively maxG is a function that associates a pair (v,id(Ai)) with the maximum product of probabilities on any path in Ai between v and an end node for each vVG, tablesG(v) is a set of tuples of the form (current, activityID, t0, probability, previous, next), where current is a pointer to an observation, activityID IA, previous and next are pointers to tuples in tablesG completedG is a function that associates an activity with a set of references to tuples in tablesG corresponding to completed instances of the activity IRI 2007 13 MAGIC insertion algorithm Check whether the newly observed action is the start node for any activity For intermediate nodes, explore entries in the index tables associated with predecessor nodes Complexity: algorithm insert runs in time O(|A|∙ max(V,E,δ)A(|V |) ∙ |O|), where O is the set of observations indexed so far. IRI 2007 14 Evolution of a MAGIC index (1/6) b A1(0.4) a A1(0.6) Index tables tablesG Observation table O A1 e A1,A2 d A2(0.7) A2 A2(0.3) a c curr … ts Action … 1 a actID t0 prob prev next A1 1 1 A2 1 1 Both activities A1 and A2 have a as their start node IRI 2007 15 Evolution of a MAGIC index (2/6) b A1(0.4) a A1(0.6) Index tables tablesG Observation table O A1 e A1,A2 d A2(0.7) A2 A2(0.3) a c curr … ts Action … 1 a … 2 a actID t0 prob prev next A1 1 1 A2 1 1 To apply the Minimal Span restriction, the tuples in tablesG(a) are updated to point to the new observation IRI 2007 16 Evolution of a MAGIC index (3/6) b A1(0.4) a A1(0.6) Index tables tablesG Observation table O A1 e A1,A2 d A2(0.7) A2 A2(0.3) a c a is the only predecessor of b in the multi-activity graph curr … ts Action … 1 a … 2 a … 3 b actID t0 prob prev next A1 1 1 A2 1 1 prev next b curr actID t0 prob A1 1 0.4 The probability is equal to the product of the probability of the tuple in tablesG(a) and the probability on the edge from a to b IRI 2007 17 Evolution of a MAGIC index (4/6) b A1(0.4) a A1(0.6) Index tables tablesG Observation table O A1 e A1,A2 d A2(0.7) A2 A2(0.3) a c curr … ts Action … 1 a … 2 a … 3 b … 4 b actID t0 prob prev next A1 1 1 A2 1 1 prev next b curr actID t0 prob A1 1 0.4 To apply the Earliest Action restriction the fourth observation is not linked to the first tuple in tablesG(a) that already has a successor IRI 2007 18 Evolution of a MAGIC index (5/6) b A1(0.4) a A1(0.6) Index tables tablesG Observation table O A1 e A1,A2 a curr d A2(0.7) A2 A2(0.3) actID t0 prob prev A1 1 1 A2 1 1 b c a is the only predecessor of c in the multi-activity graph next … ts Action … 1 a … 2 a … 3 b … 4 b … 5 c IRI 2007 curr actID t0 prob A1 1 0.4 prev next c curr actID t0 prob A2 1 1 prev next 19 Evolution of a MAGIC index (6/6) b A1(0.4) a A1(0.6) Index tables tablesG Observation table O A1 e A1,A2 a curr d A2(0.7) A2 A2(0.3) actID t0 prob prev A1 1 1 A2 1 1 b c b, c, and e are predecessors of d in the multi-activity graph next … ts Action … 1 a … 2 a … 3 b … 4 b … 5 c … 6 d curr actID t0 prob A1 1 0.4 prev next prev next prev next c curr actID t0 prob A2 1 1 d d is an end node for both activities A1 and A2: two completed occurrences are thus identified curr IRI 2007 actID t0 prob A1 1 0.4 A2 1 0.3 20 MAGIC-evidence and MAGIC-id The MAGIC-evidence algorithm finds all minimal sets of observations that validate the occurrence of activities in A with a probability exceeding a given threshold The MAGIC-id algorithm identifies those tuples in completedG (and hence the set of associated activity IDs) that have maximum probability and are within the required time span IRI 2007 21 Experimental results Experiments were conducted on two data sets A third party depersonalized dataset consisting of travel information and containing approximately 7.5 million observations A synthetic dataset of 5 million observations randomly generated 30 manually generated activity definitions were used in this experiment randomly generated activity definitions were used in this experiment We measured The time to build the index The consumption of memory The time to answer queries IRI 2007 22 Experimental results In memory index build time 700 All the experiments were 600 Unrestricted 500 Time [s] run on a Pentium 4 3.2Ghz with 2 GB of RAM running SuSE 9.3 averaged over 10 independent runs Minimal span 400 Earlieast action 300 200 100 0 1000 5000 10000 50000 100000 500000 1000000 5000000 Number of observations Memory consumption Query time 7 700 6 600 Unrestricted Unrestricted 500 5 Minimal span 400 Earlieast action Time [s] Memory [MB] 300 Minimal span 4 Earlieast action 3 200 2 100 1 0 0 1000 5000 10000 50000 100000 1000 500000 1000000 5000000 5000 10000 50000 100000 500000 1000000 5000000 Number of observations Number of observations IRI 2007 23 Conclusions We showed that finding all the occurrences of multiple interleaved activities in observation data is a computationally complex problem We proposed an effective data structure to index large numbers of observations and concurrently monitor occurrences of multiple activities as new observations are collected A key point in our approach is the introduction of two reasonable restrictions – but other restrictions can be defined as well – that reduce the overall complexity of the activity recognition problem to a manageable level The experiments on both a synthetic and a third-party dataset show that MAGIC is fast and has reasonable memory consumption, and allows to solve the Evidence and Identification problems effectively Further efforts will be devoted to The definition of an on-disk version of the index The application of our approach to index video surveillance data IRI 2007 24