MAGIC: A Multi-Activity Graph Index for Activity Detection

advertisement
MAGIC: A Multi-Activity
Graph Index for Activity
Detection
Massimiliano Albanese1
Andrea Pugliese2
V.S. Subrahmanian1
Octavian Udrea1
1 University
of Maryland Institute for Advanced Computer Studies, College Park, Maryland, USA
2 University of Calabria - DEIS department, Rende, Italy
IRI 2007
1
Introduction

Many applications require to monitor large
volumes of observation data for the occurrence
of certain activities
 E.g. web servers maintain large server logs
 Early detecting what action a user is trying to perform may
allow to prefetch or customize data

Activity detection is a non trivial task
 Real
world activities tend to be high level and can
often be executed in many different ways
 Observations may be the result of interleaved
activities
IRI 2007
2
Key contributions

The main contributions of this work are
 The
definition of a Multi-Activity Graph Index
(MAGIC), which can index very large numbers of
observations from interleaved activities
 Algorithms to build such index
 Algorithms to answer two types of queries


Evidence problem: find all sequences of observations that
validate the occurrence of an activity with a minimum
probability threshold
Identification problem: identify the most probable activity
occurring in an observation sequence
IRI 2007
3
Stochastic Activity

A Stochastic Activity is a labeled graph (V,E,δ)
where
 V is a finite set of action
 E is a subset of (V×V)
symbols
 vV
s.t. ∄v'V s.t. (v',v)E, i.e., there exists at least
one start node in V
 vV s.t. ∄v'V s.t. (v,v')E, i.e., there exists at least
one end node in V
 δ :E[0,1] is a function that associates a probability
distribution with the outgoing edges of each node

vV Σ{v'  V | (v,v')  E} δ(v,v') = 1
IRI 2007
4
Example of Stochastic Activity
Online purchase stochastic activity
start node
(V, E, δ)
end node
V = {catalog, itemDetails, cart, shippingMethod, paymentMethod, review, confirm}
IRI 2007
5
Activity Instance and Occurrence

Assumptions

Each node in an activity is an observable event
 The probability of taking an action at any time only depends on the last action
 All observations are stored in a single relational database table

An instance of a stochastic activity (V,E,δ) is a path (sequence of nodes)
from a start node to an end node


The probability of an activity instance is the product of the edge probabilities
along the path
An occurrence of a stochastic activity (V,E,δ) in an observation table O with
probability p is a sequence of observations corresponding to the nodes of
an activity instance

The probability of an occurrence is the probability of the instance
 The span of an occurrence is the time interval including all the observations
IRI 2007
6
Example of Activity Occurrence

The “online purchase stochastic activity”
occurs in the web server log shown in the
table

The sequence of observations with identifiers
{1, 4, 7, 10, 13, 14} corresponds to the
activity instance {catalog, cart, shippingMethod,
paymentMethod, review, confirm}
 The span of this activity occurrence is [1,10]
IRI 2007
id
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
ts
1
2
2
3
4
5
5
6
7
7
8
9
9
10
11
11
11
action
catalog
catalog
itemDetails
cart
itemDetails
itemDetails
shippingMethod
cart
shippingMethod
paymentMethod
itemDetails
cart
review
confirm
shippingMethod
paymentMethod
review
18
12 confirm
7
Complexity

Given an observation table O and a stochastic activity A,
the problem of finding all occurrences of A in O takes
exponential time, w.r.t. the size of O

It is not feasible to try to find all possible occurrences


We propose restrictions on what constitutes a valid occurrence in
order to greatly reduce the number of possible occurrences
Due to the size of the search space, it is important to have a data
structure that enables very fast searches for activity occurrences

We propose the MAGIC index structure that allows to


answer the Evidence and Identification problems efficiently
monitor activity occurrences as new observations are collected
IRI 2007
8
Minimal Span (MS) restriction

If two occurrences O1 and O2 are
found in the observation sequence
and the span of O2 is contained
within the span of O1, O1 is
discarded from the result set

The two sequences of
observations with identifiers {1, 4,
7, 10, 13, 14} and {1, 4, 7, 10, 17,
18} respectively, are both activity
occurrences corresponding to the
instance {catalog, cart,
shippingMethod, paymentMethod,
review, confirm}

The second one is discarded under this
restriction
IRI 2007
id
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
ts
1
2
2
3
4
5
5
6
7
7
8
9
9
10
11
11
11
12
action
catalog
catalog
itemDetails
cart
itemDetails
itemDetails
shippingMethod
cart
shippingMethod
paymentMethod
itemDetails
cart
review
confirm
shippingMethod
paymentMethod
review
confirm
id
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
ts
1
2
2
3
4
5
5
6
7
7
8
9
9
10
11
11
11
action
catalog
catalog
itemDetails
cart
itemDetails
itemDetails
shippingMethod
cart
shippingMethod
paymentMethod
itemDetails
cart
review
confirm
shippingMethod
paymentMethod
review
18 12 confirm
9
Earliest Action (EA) restriction

When looking for the next action
symbol in an activity occurrence,
the first possible successor in the
sequence is chosen.

The two sequences of
observations with identifiers {1, 4,
7, 10, 13, 14} and {1, 4, 9, 10, 13,
14} respectively, are both activity
occurrences corresponding to the
instance {catalog, cart,
shippingMethod, paymentMethod,
review, confirm}

The second one is discarded under this
restriction
IRI 2007
id
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
ts
1
2
2
3
4
5
5
6
7
7
8
9
9
10
11
11
11
12
action
catalog
catalog
itemDetails
cart
itemDetails
itemDetails
shippingMethod
cart
shippingMethod
paymentMethod
itemDetails
cart
review
confirm
shippingMethod
paymentMethod
review
confirm
id
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
ts
1
2
2
3
4
5
5
6
7
7
8
9
9
10
11
11
11
action
catalog
catalog
itemDetails
cart
itemDetails
itemDetails
shippingMethod
cart
shippingMethod
paymentMethod
itemDetails
cart
review
confirm
shippingMethod
paymentMethod
review
18 12 confirm
10
Multi-Activity Graph

In order to efficiently monitor observations for
occurrences of multiple activities, we first merge all
activity definitions from A = {A1,…, Ak} into a single
graph

A Multi-Activity Graph is a triple G = (VG, IA, δG) where



VG=∪i=1,…,kVi is a set of action symbols
IA={id(A1),…,id(Ak)} is a set of unique identifiers for activities in A
δG: VG×VG×IA[0,1] is a function that associates a triple
(v,v',id(Ai)) with δi(v,v'), if (v,v')  Ei and 0 otherwise.
IRI 2007
11
Example of Multi-Activity Graph
b
0.4
a
e
0.6
b
d
A1
a
e
A1(0.6)
e
A2
0.7
a
A1
A1(0.4)
A2(0.7)
A1,A2
d
A2(0.3)
c
c
0.3
d
Merged graph
A2
IRI 2007
12
Multi-Activity Graph Index

Given a Multi-Activity Graph G = (VG, IA, δG) built over A = {A1,…,
Ak}, a Multi-Activity Graph Index is a 6-tuple
IG = (G,startG,endG,maxG,tablesG,completedG), where

startG and endG are functions that associate each node vVG with the set
of activities for which v is a start or end node respectively
 maxG is a function that associates a pair (v,id(Ai)) with the maximum
product of probabilities on any path in Ai between v and an end node
 for each vVG, tablesG(v) is a set of tuples of the form (current,
activityID, t0, probability, previous, next), where current is a pointer to an
observation, activityID  IA, previous and next are pointers to tuples in
tablesG
 completedG is a function that associates an activity with a set of
references to tuples in tablesG corresponding to completed instances of
the activity
IRI 2007
13
MAGIC insertion algorithm
Check whether the
newly observed action
is the start node for
any activity
For intermediate
nodes, explore entries
in the index tables
associated with
predecessor nodes
Complexity: algorithm insert runs in time O(|A|∙ max(V,E,δ)A(|V |) ∙ |O|), where O is the
set of observations indexed so far.
IRI 2007
14
Evolution of a MAGIC index (1/6)
b
A1(0.4)
a
A1(0.6)
Index tables tablesG
Observation table O
A1
e
A1,A2
d
A2(0.7)
A2
A2(0.3)
a
c
curr
…
ts
Action
…
1
a
actID
t0
prob
prev
next
A1
1
1


A2
1
1


Both activities A1 and A2 have a
as their start node
IRI 2007
15
Evolution of a MAGIC index (2/6)
b
A1(0.4)
a
A1(0.6)
Index tables tablesG
Observation table O
A1
e
A1,A2
d
A2(0.7)
A2
A2(0.3)
a
c
curr
…
ts
Action
…
1
a
…
2
a
actID
t0
prob
prev
next
A1
1
1


A2
1
1


To apply the Minimal Span restriction,
the tuples in tablesG(a) are updated to
point to the new observation
IRI 2007
16
Evolution of a MAGIC index (3/6)
b
A1(0.4)
a
A1(0.6)
Index tables tablesG
Observation table O
A1
e
A1,A2
d
A2(0.7)
A2
A2(0.3)
a
c
a is the only predecessor of
b in the multi-activity graph
curr
…
ts
Action
…
1
a
…
2
a
…
3
b
actID
t0
prob
prev
next
A1
1
1

A2
1
1


prev
next
b
curr
actID
t0
prob
A1
1
0.4

The probability is equal to the product of
the probability of the tuple in tablesG(a) and
the probability on the edge from a to b
IRI 2007
17
Evolution of a MAGIC index (4/6)
b
A1(0.4)
a
A1(0.6)
Index tables tablesG
Observation table O
A1
e
A1,A2
d
A2(0.7)
A2
A2(0.3)
a
c
curr
…
ts
Action
…
1
a
…
2
a
…
3
b
…
4
b
actID
t0
prob
prev
next
A1
1
1

A2
1
1


prev
next
b
curr
actID
t0
prob
A1
1
0.4

To apply the Earliest Action restriction the
fourth observation is not linked to the first tuple
in tablesG(a) that already has a successor
IRI 2007
18
Evolution of a MAGIC index (5/6)
b
A1(0.4)
a
A1(0.6)
Index tables tablesG
Observation table O
A1
e
A1,A2
a
curr
d
A2(0.7)
A2
A2(0.3)
actID
t0
prob
prev
A1
1
1

A2
1
1

b
c
a is the only predecessor of
c in the multi-activity graph
next
…
ts
Action
…
1
a
…
2
a
…
3
b
…
4
b
…
5
c
IRI 2007
curr
actID
t0
prob
A1
1
0.4
prev
next

c
curr
actID
t0
prob
A2
1
1
prev
next

19
Evolution of a MAGIC index (6/6)
b
A1(0.4)
a
A1(0.6)
Index tables tablesG
Observation table O
A1
e
A1,A2
a
curr
d
A2(0.7)
A2
A2(0.3)
actID
t0
prob
prev
A1
1
1

A2
1
1

b
c
b, c, and e are predecessors of
d in the multi-activity graph
next
…
ts
Action
…
1
a
…
2
a
…
3
b
…
4
b
…
5
c
…
6
d
curr
actID
t0
prob
A1
1
0.4
prev
next
prev
next
prev
next
c
curr
actID
t0
prob
A2
1
1
d
d is an end node for both activities A1 and A2:
two completed occurrences are thus identified
curr
IRI 2007
actID
t0
prob
A1
1
0.4

A2
1
0.3
20
MAGIC-evidence and MAGIC-id

The MAGIC-evidence
algorithm finds all minimal sets
of observations that validate
the occurrence of activities in A
with a probability exceeding a
given threshold

The MAGIC-id algorithm
identifies those tuples in
completedG (and hence the set
of associated activity IDs) that
have maximum probability and
are within the required time
span
IRI 2007
21
Experimental results

Experiments were conducted on two data sets

A third party depersonalized dataset consisting of travel
information and containing approximately 7.5 million
observations


A synthetic dataset of 5 million observations randomly generated


30 manually generated activity definitions were used in this
experiment
randomly generated activity definitions were used in this experiment
We measured



The time to build the index
The consumption of memory
The time to answer queries
IRI 2007
22
Experimental results
In memory index build time
700
All the experiments were
600
Unrestricted

500
Time [s]
run on a Pentium 4 3.2Ghz
with 2 GB of RAM running
SuSE 9.3
 averaged over 10 independent
runs
Minimal span
400
Earlieast action
300
200
100
0
1000
5000
10000
50000
100000
500000 1000000 5000000
Number of observations
Memory consumption
Query time
7
700
6
600
Unrestricted
Unrestricted
500
5
Minimal span
400
Earlieast action
Time [s]
Memory [MB]

300
Minimal span
4
Earlieast action
3
200
2
100
1
0
0
1000
5000
10000
50000
100000
1000
500000 1000000 5000000
5000
10000
50000
100000
500000 1000000 5000000
Number of observations
Number of observations
IRI 2007
23
Conclusions

We showed that finding all the occurrences of multiple interleaved
activities in observation data is a computationally complex problem

We proposed an effective data structure to index large numbers of
observations and concurrently monitor occurrences of multiple activities
as new observations are collected
 A key point in our approach is the introduction of two reasonable
restrictions – but other restrictions can be defined as well – that reduce
the overall complexity of the activity recognition problem to a
manageable level


The experiments on both a synthetic and a third-party dataset show that
MAGIC is fast and has reasonable memory consumption, and allows to
solve the Evidence and Identification problems effectively
Further efforts will be devoted to


The definition of an on-disk version of the index
The application of our approach to index video surveillance data
IRI 2007
24
Download