Processing Ubiquitous Personal Event Streams to Provide User-Controlled Support ?

advertisement
Processing Ubiquitous Personal Event Streams to
Provide User-Controlled Support?
Jeremy Debattista, Simon Scerri, Ismael Rivera, and Siegfried Handschuh
Digital Enterprise Research Institute, National University of Ireland, Galway
firstname.lastname@deri.org
Abstract. The increase in use of smart devices nowadays provides us with a lot
of personal data and context information. In this paper we describe an approach
which allows users to define and register rules based on their personal data activities in an event processor, which continuously listens to perceived context
data and triggers any satisfied rules. We describe the Rule Management Ontology
(DRMO) as a means to define rules using a standard format, whilst providing a
scalable solution in the form of a Rule Network Event Processor which detects
and analyses events, triggering rules which are satisfied. Following an evaluation of the network v.s. a simplistic sequential approach, we justify a trade-off
between initialisation time and processing time.
1
Introduction
The increase in the use of smart mobile devices provides us with ample data regarding the users’ surroundings, activities and information. This data can be collected from
various applications, embedded physical sensors (such as GPS), and users’ online presence. As this information is heterogeneous in nature, it cannot be readily unified under
a common domain model. This limits the potential of having smart devices operating
on this combined user data, in order to provide added value. If this limitation is addressed, smart devices can become increasingly aware of a user’s activities, situations
and habits. As a result, daily repetitive tasks (e.g. changing the mobile mode to silent
when arriving at work) can be automated or suggested to the user.
The di.me1 project addresses the above limitation by unifying the user’s personal
information across various heterogeneous sources, including social networks and personal devices, into one standardised data representation format. di.me restricts the working Knowledge Base (KB) to cover a personal closed-world environment; introducing
and extending ontologies modeling the user’s Personal Information Model (PIM). Thus
we represent both information about me2 (such as current location, nearby persons, live
?
1
2
This work is supported in part by the European Commission under the Seventh Framework
Program FP7/2007-2013 (digital.me – ICT-257787) and in part by Science Foundation Ireland
under Grant No. SFI/08/CE/I1380 (Lı́on-2).
http://www.dime-project.eu
term coined by David Karger in the blog: http://groups.csail.mit.edu/haystack/blog/2012/02/17/personalinformation-management-is-not-personal-information-management/
2
Lecture Notes in Computer Science
posts, etc) and for me2 (such as calendar, emails, contacts, etc), rather than just the latter, as covered by the Social Semantic Desktop [8]. The availability of the di.me PIM
enables us to tap this personal data and allow users to define declarative rules.
In di.me, we develop a scalable solution that assists the user with daily repetitive
tasks within a personal information sphere. This paper’s objectives are to:
1. Enable the declarative representation of event patterns and associated actions (i.e.
rules), so as to enable both human users and their machines to make sense of them;
2. Create an Event Processor that interprets declarative rules, detects events and triggers the desired actions;
3. Evaluate the performance and scalability of the Event Processor implemented.
The rest of this paper is organised as follows: Section 2 compares related work; Section 3 and 4 focus on the DRMO ontology and Event Processor respectively; whereas
Section 5 presents the process and results of our evaluation. Future work will be discussed in the concluding remarks in Section 6.
2
Related Work
Every solution which implements event processing requires an underlying rulelanguage. Systems such as [2] propose rule languages which are specific to their particular framework. One major issue in such approaches is that the modelled rules cannot be
easily reused on other frameworks. RuleML3 is an XML markup language which allows
rules to be defined using a formal notation. Since our user-defined rules are dependant
on PIM data, the use of the Resource Description Framework (RDF) to model rules was
a natural choice. In contrast to XML-based frameworks, RDF helps us achieve semantic interoperability more easy [4]. Since in di.me we strive to have one unified model
to represent the user’s KB, XML based rules would need to be transformed into RDF
prior to being stored in the user’s PIM. Another advantage of RDF over XML is that
known knowledge can be reused in rule instances having the same semantics, thus for
example saved rule conditions can be reused in other rule instances.
Complex Event Processing (CEP) and Rule-based systems are commonly used to
allow rules to define how perceived data (or events) is processed. Whilst CEP (using
the Event-Condition-Action pattern) rules are only triggered by specific events and do
not need advanced matching techniques such as RETE, production rules are constantly
being monitored for activation using such algorithms. The DRMO vocabulary allows
users to express rules whose conditions can be expressed in a sequential fashion using
the succeeded by and preceded by operators, though rules are not necessarily activated
via specific events. Therefore DRMO rules can be classified as production rules with
the added value of temporal constraints. Our approach in creating an event processor is
to exploit CEP properties that enable the use of temporal constraints in rules and perceive data from multiple sources, whilst having a rule processing algorithm to filter and
trigger relevant rules. Traditionally, the Rete algorithm [5] is used in rule-based systems to match facts with rule patterns. This algorithm was extended by various efforts
3
http://ruleml.org
Lecture Notes in Computer Science
3
including in [9], where the authors tackle the missing temporal aspect to support complex event detection. One main problem of the original Rete Algorithm is that values
in the network should be explicit, where non-constants and variables requiring further
querying in a KB are not allowed. In this paper we create a Rule Network based on the
idea of the Rete network, supporting the two mentioned shortcomings; temporal aspect,
and allowing implicit values in the tree. Unlike Rete, the proposed Rule Network will
be used to efficiently filter rules, rather than to match conditions with nodes.
3
di.me Rule Management Ontology (DRMO)
The Rule Management Ontology (Figure 1) concepts are inspired by the EventCondition-Action (ECA) pattern. The ECA pattern is used in event-driven architectures,
where the event specifies what triggers the rule, what conditions to specify, and what
actions are executed, unlike in our case where DRMO rules do not expect any transaction event signals (such as “on update” or “on delete”) to invoke the rule. The DRMO
is based on the following rule pattern:
if R =⇒ {a1 , .., an } ; R = {c1 , ..., cm } ; Cm = {cm1 , ..., cmn }
(1)
where R represents a rule that consists of a combination of conditions cm , triggering
one or more resulting actions an . cm consists of a number of constraints cmn , possibly recursive. In theory, constraints can be applied recursively, thus allowing an infinite number of embedded conditions. For practical reasons, we expect applications of
DRMO to set a limit on the amount of embedded conditions (more on this later). We
refer the reader to [3] or the schema online4 for a full description of the ontology.
Fig. 1. An extract of the Rule Management Ontology5
3.1
Defining Rules
Figure 2 demonstrates how a user can define a rule using the intelligent di.me User
Interface(UI) [7]. The defined rule consists of three adjoined conditions (a specific usercreated Situation, a Contact and an Image) and an action (share image). The latter two
conditions have been constrained. The full rule can be described as: “When in Situation
‘going out’ and in the vicinity of any Contact(s) belonging to a Group ‘Friends’ and
a new Image is created then: Share the Image with that Contact(s)”. This functionality
4
5
http://www.semanticdesktop.org/ontologies/drmo/
The full visualisation and description of the DRMO ontology can be found online:
http://www.semanticdesktop.org/ontologies/drmo/
4
Lecture Notes in Computer Science
is similar to that used in IFTTT6 services, where users can define rules in terms of
condition blocks and actions. JSON7 is used as a communication interface between the
DRMO instances stored in the PIM and the UI.
Fig. 2. Creating rules using the Rule Manager in di.me userware
4
Rule Network Event Processor
For the Rule Network Event Processor, a network is generated based on the idea of
Forgy’s Rete network [5], where DRMO rule instances are registered in the event processor, forming a network structure. Rule instances are also transformed into SPARQL
queries, each of which is stored in a level above the network’s terminal node. An algorithm based on the K-Shortest Path [10] is proposed to filter and trigger candidate rules
from the network. The system architecture is illustrated in Figure 3. Different sources
send data about the user’s activities, surroundings and information as registered in the
PIM. A broadcaster broadcasts the new events to the registered processes (the rule network), which is then processed to trigger any rules relevant to the newly perceived event.
A garbage collection (GC) service is attached to the rule network to discard any expired
partial results stored in join nodes. Below, we first discuss how rules are transformed
into a rule network carrying SPARQL queries. Then, we explain the event object and
its lifetime in the rule network. Finally we show how the network is processed to filter
and trigger those which are satisfied.
Fig. 3. General System Architecture
6
7
http://ifttt.com
http://www.json.org
Fig. 4. A Rule Network Example
Lecture Notes in Computer Science
4.1
5
Building the Network
In our rule network we define six different nodes. The Root node is the one with no
incoming vertices and the Terminal node is the one with no outgoing vertices. Figure 4
is an example of a rule network.
Each rule instance registered to the event processor is fed through the root node of
the network to start the process of adding rules to the network and transforming them
into SPARQL queries. A rule instance is first broken down into a number of conditions
defined by the property drmo:isComposedOf. These conditions and their constraints are
attached to the network’s root as nodes, starting with the Object-Type node (in Figure 4,
nmo:Email) and finishing with the Rule node (represented as circle nodes : R1, R2, R3),
storing the transformed SPARQL query. Then, for each of condition’s constraints, we
check the drmo:hasConstraintOnProperty and add the node to the network (attached
to the previously added Object-Type node), followed by the associated constraint value
(subject/object) node. These last two mentioned nodes are Triple-type nodes (shown as
rectangles in Figure 4) and store the triple patterns required to create the final query.
Two conditions (in a rule) can be joined together using the Join node, apart from
those linked together with a drmo:or operator. In the latter case, the event processor
handles the drmo:or joined conditions as two distinct rules triggering the same action
since these are independent from each other (see all supported operators in [3]). An
advantage of this decision is that for such queries the response time is decreased, as we
are adding less triple patterns to the queries. A Join node refers to two ordered inputs.
For each input we store the intermediate query and intermediate result. The advantage
of storing intermediate results is that during processing they enable us to check if a
rule can be triggered based on the intermediate result. Intermediate queries are used in
order to avoid using SPARQL FILTERS to compare the event’s occurrence time for rule
conditions joined by the drmo:succeededBy and drmo:precededBy operators. Another
advantage of storing intermediate results is that unlike the sequential processor defined
in [3], there is no need for an Event Log to keep track of the perceived events. Similar
to [9] the ordered inputs are used to check the temporal constraints of a rule. If the join
node is a succeededBy node, then the left input must be satisfied first in order to trigger
the right input (and vice-versa for a preceededBy node, where the right input is the first
input to be satisfied). There is no particular order for and join nodes.
The rule transformation ends with the Rule node, in which the transformed SPARQL
query [3] and the action instances are stored. SPARQL allows the fast querying of the
perceived events stored in the user’s KB. These queries also help in filling the “blank” in
a rule which does not have an explicit value, for example a rule “Receive an email from
<urn:Person1>” would require a further query to the PIM to get all the email addresses
linked to <urn:Person1>, since the email event metadata would contain a raw email
address (e.g. “person1@email.com”) rather than the URI defined in the user’s PIM.
4.2
Events and Validity
Events perceived by the sensors and stored in the user’s KB (PIM), can trigger any userdefined rule. When events are perceived (this could be a created, modified or deleted
resource in the PIM), semantic lifting is performed on relevant graphs in order to store
6
Lecture Notes in Computer Science
the metadata for the event perceived. The event’s resource type together with a timestamp, a pointer to the graph this resource is stored in, and the event operation (e.g.
Resource Modified), is then broadcasted to the event processor.
The lifetime of an event is difficult to predict since different event types might have
different time-spans when correlated to other events in rules. In order for the rule “If I
receive an email succeededby a new document created” to trigger, two events (receive
an email) and (new document created) need to occur after each other. This could lead
the rule to trigger at an indefinite time if the first event does not expire after a certain
time. On the other hand, premature event expiry might also lead to relevant rules being
missed. In our system we employ the consumption mode technique, maximum event
lifetime technique, and the time-based windows technique, similar to how these are
described in [9]. The first technique is used to keep the most recent resource URI in
a join node. On the other hand the maximum event lifetime expires resource URIs in
join nodes after an X amount of time, irrelevant of the resource type, whilst for the last
technique, these expire according to a time period assigned to the respective resource
type. Events defining a context state [1] (e.g. current availability) are automatically
invalidated and removed from the join nodes according to the changed state.
4.3
Processing the Network Algorithm
Algorithm 1: Processing Rule Network to
trigger rule(s) from detected events
Data: Perceived Event E; Rule Network
N; ResultSet S
get resource type T for event E ;
PL ← ksp(N,T) ;
from PL find common subpaths Cp ;
while ∃ path P in PL do
if path P contains join node J then
inputPath ← get left or right
input of P in J ;
if J = AndNode then
inputPath.result ←
execute(inputPath.query) ;
checkEventJoinNode(N ,
inputP ath, J) ;
continue iteration ;
else
if path execute(P.rule) 6= null
then
remove all paths in PL
where P is not a subpath
(Cp ) ;
S.add(P.rule)
remove all paths in PL where P
is a subpath (Cp ) ;
Procedure
ode(N ,P ,J)
checkEventJoinN-
Input: Rule Network N ; InputPath P ;
JoinNode J
if J = SucceededByNode then
if inputPath is the left input then
inputPath.result ←
execute(inputPath.query) ;
else
if left input result is not empty
then
if execute(inputPath.query)
6= null then
S.add(P.rule) ;
if J = PrecededByNode then
if inputPath is the right input then
inputPath.result ←
execute(inputPath.query) ;
else
if right input result is not empty
then
if execute(inputPath.query)
6= null then
S.add(P.rule) ;
Lecture Notes in Computer Science
7
In contrast to the Rete Algorithm [5], the improved processor we introduce does
not process the network by matching patterns, but aims to efficiently filter and find
candidate rules. This is done by forming a subgraph, with the perceived resource type as
the root node. Algorithm 1 shows how the processing of the network is done. Candidate
rules are filtered (PL ) using Yen’s K-Shortest Path algorithm [10]. Common subpaths
(Cp ) are also discovered during the execution of the k-shortest path algorithm. This has
the benefit of reducing the number of queries performed to check for potential rules
to be triggered. Once a set of paths is ordered, these are iterated and the query in the
rule node is executed. Rules whose queries return a result (indicating a rule triggered)
are stored in a set which is then passed to the action executor. The rule matcher runs
its matching process until all paths (in PL ) are checked. At worst, the matcher iterates
on all candidate paths found by the KSP algorithm. When a rule is matched, all other
paths which do not have the same condition as the rule in question, are removed and
not checked by the matcher. This approach is due to the non-repetitive nature of the
network, where rules with the same condition are represented by the same branch in the
network.
5
Evaluation
Our evaluation serves two purposes: to compare the efficiency of the new networkbased approach to the sequential approach proposed earlier [3], and to investigate its
scalability. We base our investigation on a study on the performance of event processing systems [6], which focus on the bottlenecks and the degrading of performance as
the load is increased. The authors suggest benchmarking various factors, including the
selection and pattern detection. The key performance aspects to prove the scalability of
the proposed event processor are:
– The initialisation of the Rule Network;
– The selection of rules on event detection;
– The maximum load of events the event processor can handle.
For these tests, we require a knowledge base (KB), DRMO rule instances and event
data. As a KB, we use a sample di.me PIM containing data based on various OSCAF8
ontologies. Test rules were manually created. In the context of di.me, we limit the number of recursive conditions (defined in Section 3 - equation (1)) to 2. For these evaluation
tests no repeated rules were used so that the Rule Network event processor is not given
an advantage over the Sequential. We also propose the following limits: 50 different
rules each having 5 conditions, based on the sample PIM and following a small user
study. For the generation of the event data we developed an event generator that generates realistic9 daily events based on the given PIM and scenarios. The evaluation tests
have been carried out on a Macbook Pro, Intel Core i5 with 2.4Ghz processor speed and
4GB of RAM. Data is stored in-memory using Sesame10 .
8
9
10
http://www.oscaf.org
All data and evaluation results can be found: http://smile.deri.ie/Rule-Network-Evaluation
http://www.openrdf.org
8
5.1
Lecture Notes in Computer Science
Initialisation Test
The aim of this test is to measure the time taken to register rule instances in both event
processors (sequential vs rule network). We initialise our event processors ten times
each, starting with five rules and progressively incrementing the number of loaded rules
by five. Figure 5 shows the time taken (in milliseconds), against the number of rules
loaded in the event processors with each initialisation. The graph shows that the behaviour of the time consumed in relation to the number of rules loaded in the system is
linear. The rule network event processor consistently took more time to initialise than
the sequential event processor, due to the extra overhead needed to update the network.
Fig. 5. Time taken to initialise Event Processors
5.2
Fig. 6. No. of distinct events perceived over time
Filtering Test
In this test we measure the time taken (in ms) by the processors to select candidate
rules with each event perceived. We use data generated over 24-hours, thus assuming
that events in the event log (required only for the sequential event processor) remain
valid throughout this period. After loading all 50 rules in both processors, the test was
repeated four times, each time producing a number of events until the desired number
of distinct types were available on the event log. For the experiment we require that the
filtering algorithm returns candidate rules satisfying either of the following criteria:
1. A rule with one condition where the associated resource type
(drmo:hasResourceType) is the same as the perceived event’s type;
2. A multiple-condition rule such that one of it’s conditions is satisfied by the type of
the new event.
The graph in Figure 6 shows how the sequential process grows in time as the number
of distinct event types increase in the event log, whilst the network filtering process
remains constant. This result was expected, since the sequential’s filtering process needs
to go through all the rules to check their resource types against the distinct event types
on the event log, in order to satisfy either of the mentioned criteria. On the other hand
the network is built such that the associated rules can be automatically found by forming
a sub-graph, using the resource type as the start vertex. The network filtering process
takes time in order to sort the possible rules by shortest path first, allowing simple rules
to be evaluated before the more complex ones.
Lecture Notes in Computer Science
5.3
9
Load Testing
The aim of this experiment is to understand to which extent the event processors can
work sufficiently and in an acceptable manner. For this experiment we created a producer/consumer service, where the producer sends a stream of generated events to the
consumer (the event processor), which filters and triggers rules simultaneously. After
both processors were loaded with the 50 rules, we ran this test five times, progressively
increasing the event load up to a maximum 16000 events. We established that an acceptable time-frame for the processing of consumed events should be less than 20 seconds.
Since the sequential process failed this test outright, with 100 events being processed in
92.48 seconds, it was subsequently eliminated from this experiment.
Figure 7 shows time time taken (in seconds) to perform all consumed events for
the rule network event processor. In particular, we observe that 10000 events were consumed and processed in around 2 seconds. In the di.me userware, we do not foresee a
user to have anything near 10000 events being perceived at the same time. More realistically, we observe that the rule network event processor can process up to 2000 events
in a reasonable time (≈ 0.4 sec), with 100 events being processed in less than 0.1 sec.
Fig. 7. Testing time taken to process events
From these results we conclude that although during the initialisation stage the Rule
Network event processor takes longer than the Sequential event processor, the runtime
process for filtering and triggering takes considerably less time, regardless of the load
of events being consumed by the processor. The trade-off between the time taken to
initialise the rule, and to process events, is justified for the following reasons:
1. The initialisation process is done only once;
2. The difference in milliseconds between both event processors is almost insignificant (to initialise 50 rules, the network takes < 100ms more);
3. The filtering of events takes considerably less time in the network event processor;
4. Even with a load of 10000 events, the rule network performs within a reasonable
time-frame.
6
Conclusion
In this paper we describe an ontology-driven event processor which operates on personal, activities and context; as perceived by various devices and online sources. The
10
Lecture Notes in Computer Science
event processor compares event streams to the antecedent of declarative rules, defined
by a user through an intelligent UI. Prospectively, a JSON-LD11 serialisation would enable us to interchange DRMO instances, ensuring that the full semantics are retained
event at the UI level. Rules consist of various conditions corresponding to items in a
unified PIM, and one or more resultant actions. Due to the nature of the rule conditions
and their constraints, we propose to represent them by a rule management ontology.
This also means that in theory, the personalised user-defined rules can be processed by
various platforms. In the future, we also intend to investigate the possibility to enable
the automatic rule learning and discovery based on the availability of a user’s context
history. Defined rules are transformed and registered in a Rule Network, which is then
operated upon by the event processor. After initialising the rule network, the system is
ready to start perceiving events from multiple datasources. With each new event, the
rule network filters candidate rules and triggers any which are satisfied. The proposed
network-based event processor is compared to an earlier sequential approach. Evaluation results show that although the rule network takes more time at the initialisation
stage, it performs considerably better than the alternative, during both the filtering and
triggering processes. In particular, our event processor can process 100 events in less
than 0.1 seconds.
References
1. J. Attard, S. Scerri, I. Rivera, and S. Handschuh. Ontology-based situation recognition for
context-aware systems. In I-SEMANTICS 2013, 2013.
2. V. Beltran, K. Arabshian, and H. Schulzrinne. Ontology-based user-defined rules and
context-aware service composition system. In Proceedings of the 8th international conference on The Semantic Web, ESWC’11, Berlin, Heidelberg, 2012. Springer-Verlag.
3. J. Debattista, S. Scerri, I. Rivera, and S. Handschuh. Ontology-based rules for recommender
systems. In Proceedings of the International Workshop on Semantic Technologies meet Recommender Systems & Big Data, 2012.
4. S. Decker, S. Melnik, F. van Harmelen, D. Fensel, M. Klein, J. Broekstra, M. Erdmann,
and I. Horrocks. The semantic web: the roles of xml and rdf. Internet Computing, IEEE,
4(5):63–73, 2000.
5. C. L. Forgy. Rete: A fast algorithm for the many pattern/many object pattern match problem.
Artificial Intelligence, 19(1):17 – 37, 1982.
6. M. R. N. Mendes, P. Bizarro, and P. Marques. A performance study of event processing
systems. In TPCTC, pages 221–236, 2009.
7. S. Scerri, A. Schuller, I. Rivera, J. Attard, J. Debattista, M. Valla, F. Hermann, and S. Handschuh. Interacting with a context-aware personal information sharing system. In Proceedings
of the 15th International Conference on Human-Computer Interaction (HCI2013), 2013.
8. M. Sintek, S. Handschuh, S. Scerri, and L. van Elst. Technologies for the social semantic
desktop. In Reasoning Web. Semantic Technologies for Information Systems, volume 5689
of Lecture Notes in Computer Science, pages 222–254. Springer Berlin / Heidelberg, 2009.
9. K. Walzer, T. Breddin, and M. Groch. Relative temporal constraints in the rete algorithm for
complex event detection. In DEBS, pages 147–155, 2008.
10. J. Y. Yen. Finding the K Shortest Loopless Paths in a Network. Management Science, 1971.
11
http://json-ld.org
Download