Processing Ubiquitous Personal Event Streams to Provide User-Controlled Support? Jeremy Debattista, Simon Scerri, Ismael Rivera, and Siegfried Handschuh Digital Enterprise Research Institute, National University of Ireland, Galway firstname.lastname@deri.org Abstract. The increase in use of smart devices nowadays provides us with a lot of personal data and context information. In this paper we describe an approach which allows users to define and register rules based on their personal data activities in an event processor, which continuously listens to perceived context data and triggers any satisfied rules. We describe the Rule Management Ontology (DRMO) as a means to define rules using a standard format, whilst providing a scalable solution in the form of a Rule Network Event Processor which detects and analyses events, triggering rules which are satisfied. Following an evaluation of the network v.s. a simplistic sequential approach, we justify a trade-off between initialisation time and processing time. 1 Introduction The increase in the use of smart mobile devices provides us with ample data regarding the users’ surroundings, activities and information. This data can be collected from various applications, embedded physical sensors (such as GPS), and users’ online presence. As this information is heterogeneous in nature, it cannot be readily unified under a common domain model. This limits the potential of having smart devices operating on this combined user data, in order to provide added value. If this limitation is addressed, smart devices can become increasingly aware of a user’s activities, situations and habits. As a result, daily repetitive tasks (e.g. changing the mobile mode to silent when arriving at work) can be automated or suggested to the user. The di.me1 project addresses the above limitation by unifying the user’s personal information across various heterogeneous sources, including social networks and personal devices, into one standardised data representation format. di.me restricts the working Knowledge Base (KB) to cover a personal closed-world environment; introducing and extending ontologies modeling the user’s Personal Information Model (PIM). Thus we represent both information about me2 (such as current location, nearby persons, live ? 1 2 This work is supported in part by the European Commission under the Seventh Framework Program FP7/2007-2013 (digital.me – ICT-257787) and in part by Science Foundation Ireland under Grant No. SFI/08/CE/I1380 (Lı́on-2). http://www.dime-project.eu term coined by David Karger in the blog: http://groups.csail.mit.edu/haystack/blog/2012/02/17/personalinformation-management-is-not-personal-information-management/ 2 Lecture Notes in Computer Science posts, etc) and for me2 (such as calendar, emails, contacts, etc), rather than just the latter, as covered by the Social Semantic Desktop [8]. The availability of the di.me PIM enables us to tap this personal data and allow users to define declarative rules. In di.me, we develop a scalable solution that assists the user with daily repetitive tasks within a personal information sphere. This paper’s objectives are to: 1. Enable the declarative representation of event patterns and associated actions (i.e. rules), so as to enable both human users and their machines to make sense of them; 2. Create an Event Processor that interprets declarative rules, detects events and triggers the desired actions; 3. Evaluate the performance and scalability of the Event Processor implemented. The rest of this paper is organised as follows: Section 2 compares related work; Section 3 and 4 focus on the DRMO ontology and Event Processor respectively; whereas Section 5 presents the process and results of our evaluation. Future work will be discussed in the concluding remarks in Section 6. 2 Related Work Every solution which implements event processing requires an underlying rulelanguage. Systems such as [2] propose rule languages which are specific to their particular framework. One major issue in such approaches is that the modelled rules cannot be easily reused on other frameworks. RuleML3 is an XML markup language which allows rules to be defined using a formal notation. Since our user-defined rules are dependant on PIM data, the use of the Resource Description Framework (RDF) to model rules was a natural choice. In contrast to XML-based frameworks, RDF helps us achieve semantic interoperability more easy [4]. Since in di.me we strive to have one unified model to represent the user’s KB, XML based rules would need to be transformed into RDF prior to being stored in the user’s PIM. Another advantage of RDF over XML is that known knowledge can be reused in rule instances having the same semantics, thus for example saved rule conditions can be reused in other rule instances. Complex Event Processing (CEP) and Rule-based systems are commonly used to allow rules to define how perceived data (or events) is processed. Whilst CEP (using the Event-Condition-Action pattern) rules are only triggered by specific events and do not need advanced matching techniques such as RETE, production rules are constantly being monitored for activation using such algorithms. The DRMO vocabulary allows users to express rules whose conditions can be expressed in a sequential fashion using the succeeded by and preceded by operators, though rules are not necessarily activated via specific events. Therefore DRMO rules can be classified as production rules with the added value of temporal constraints. Our approach in creating an event processor is to exploit CEP properties that enable the use of temporal constraints in rules and perceive data from multiple sources, whilst having a rule processing algorithm to filter and trigger relevant rules. Traditionally, the Rete algorithm [5] is used in rule-based systems to match facts with rule patterns. This algorithm was extended by various efforts 3 http://ruleml.org Lecture Notes in Computer Science 3 including in [9], where the authors tackle the missing temporal aspect to support complex event detection. One main problem of the original Rete Algorithm is that values in the network should be explicit, where non-constants and variables requiring further querying in a KB are not allowed. In this paper we create a Rule Network based on the idea of the Rete network, supporting the two mentioned shortcomings; temporal aspect, and allowing implicit values in the tree. Unlike Rete, the proposed Rule Network will be used to efficiently filter rules, rather than to match conditions with nodes. 3 di.me Rule Management Ontology (DRMO) The Rule Management Ontology (Figure 1) concepts are inspired by the EventCondition-Action (ECA) pattern. The ECA pattern is used in event-driven architectures, where the event specifies what triggers the rule, what conditions to specify, and what actions are executed, unlike in our case where DRMO rules do not expect any transaction event signals (such as “on update” or “on delete”) to invoke the rule. The DRMO is based on the following rule pattern: if R =⇒ {a1 , .., an } ; R = {c1 , ..., cm } ; Cm = {cm1 , ..., cmn } (1) where R represents a rule that consists of a combination of conditions cm , triggering one or more resulting actions an . cm consists of a number of constraints cmn , possibly recursive. In theory, constraints can be applied recursively, thus allowing an infinite number of embedded conditions. For practical reasons, we expect applications of DRMO to set a limit on the amount of embedded conditions (more on this later). We refer the reader to [3] or the schema online4 for a full description of the ontology. Fig. 1. An extract of the Rule Management Ontology5 3.1 Defining Rules Figure 2 demonstrates how a user can define a rule using the intelligent di.me User Interface(UI) [7]. The defined rule consists of three adjoined conditions (a specific usercreated Situation, a Contact and an Image) and an action (share image). The latter two conditions have been constrained. The full rule can be described as: “When in Situation ‘going out’ and in the vicinity of any Contact(s) belonging to a Group ‘Friends’ and a new Image is created then: Share the Image with that Contact(s)”. This functionality 4 5 http://www.semanticdesktop.org/ontologies/drmo/ The full visualisation and description of the DRMO ontology can be found online: http://www.semanticdesktop.org/ontologies/drmo/ 4 Lecture Notes in Computer Science is similar to that used in IFTTT6 services, where users can define rules in terms of condition blocks and actions. JSON7 is used as a communication interface between the DRMO instances stored in the PIM and the UI. Fig. 2. Creating rules using the Rule Manager in di.me userware 4 Rule Network Event Processor For the Rule Network Event Processor, a network is generated based on the idea of Forgy’s Rete network [5], where DRMO rule instances are registered in the event processor, forming a network structure. Rule instances are also transformed into SPARQL queries, each of which is stored in a level above the network’s terminal node. An algorithm based on the K-Shortest Path [10] is proposed to filter and trigger candidate rules from the network. The system architecture is illustrated in Figure 3. Different sources send data about the user’s activities, surroundings and information as registered in the PIM. A broadcaster broadcasts the new events to the registered processes (the rule network), which is then processed to trigger any rules relevant to the newly perceived event. A garbage collection (GC) service is attached to the rule network to discard any expired partial results stored in join nodes. Below, we first discuss how rules are transformed into a rule network carrying SPARQL queries. Then, we explain the event object and its lifetime in the rule network. Finally we show how the network is processed to filter and trigger those which are satisfied. Fig. 3. General System Architecture 6 7 http://ifttt.com http://www.json.org Fig. 4. A Rule Network Example Lecture Notes in Computer Science 4.1 5 Building the Network In our rule network we define six different nodes. The Root node is the one with no incoming vertices and the Terminal node is the one with no outgoing vertices. Figure 4 is an example of a rule network. Each rule instance registered to the event processor is fed through the root node of the network to start the process of adding rules to the network and transforming them into SPARQL queries. A rule instance is first broken down into a number of conditions defined by the property drmo:isComposedOf. These conditions and their constraints are attached to the network’s root as nodes, starting with the Object-Type node (in Figure 4, nmo:Email) and finishing with the Rule node (represented as circle nodes : R1, R2, R3), storing the transformed SPARQL query. Then, for each of condition’s constraints, we check the drmo:hasConstraintOnProperty and add the node to the network (attached to the previously added Object-Type node), followed by the associated constraint value (subject/object) node. These last two mentioned nodes are Triple-type nodes (shown as rectangles in Figure 4) and store the triple patterns required to create the final query. Two conditions (in a rule) can be joined together using the Join node, apart from those linked together with a drmo:or operator. In the latter case, the event processor handles the drmo:or joined conditions as two distinct rules triggering the same action since these are independent from each other (see all supported operators in [3]). An advantage of this decision is that for such queries the response time is decreased, as we are adding less triple patterns to the queries. A Join node refers to two ordered inputs. For each input we store the intermediate query and intermediate result. The advantage of storing intermediate results is that during processing they enable us to check if a rule can be triggered based on the intermediate result. Intermediate queries are used in order to avoid using SPARQL FILTERS to compare the event’s occurrence time for rule conditions joined by the drmo:succeededBy and drmo:precededBy operators. Another advantage of storing intermediate results is that unlike the sequential processor defined in [3], there is no need for an Event Log to keep track of the perceived events. Similar to [9] the ordered inputs are used to check the temporal constraints of a rule. If the join node is a succeededBy node, then the left input must be satisfied first in order to trigger the right input (and vice-versa for a preceededBy node, where the right input is the first input to be satisfied). There is no particular order for and join nodes. The rule transformation ends with the Rule node, in which the transformed SPARQL query [3] and the action instances are stored. SPARQL allows the fast querying of the perceived events stored in the user’s KB. These queries also help in filling the “blank” in a rule which does not have an explicit value, for example a rule “Receive an email from <urn:Person1>” would require a further query to the PIM to get all the email addresses linked to <urn:Person1>, since the email event metadata would contain a raw email address (e.g. “person1@email.com”) rather than the URI defined in the user’s PIM. 4.2 Events and Validity Events perceived by the sensors and stored in the user’s KB (PIM), can trigger any userdefined rule. When events are perceived (this could be a created, modified or deleted resource in the PIM), semantic lifting is performed on relevant graphs in order to store 6 Lecture Notes in Computer Science the metadata for the event perceived. The event’s resource type together with a timestamp, a pointer to the graph this resource is stored in, and the event operation (e.g. Resource Modified), is then broadcasted to the event processor. The lifetime of an event is difficult to predict since different event types might have different time-spans when correlated to other events in rules. In order for the rule “If I receive an email succeededby a new document created” to trigger, two events (receive an email) and (new document created) need to occur after each other. This could lead the rule to trigger at an indefinite time if the first event does not expire after a certain time. On the other hand, premature event expiry might also lead to relevant rules being missed. In our system we employ the consumption mode technique, maximum event lifetime technique, and the time-based windows technique, similar to how these are described in [9]. The first technique is used to keep the most recent resource URI in a join node. On the other hand the maximum event lifetime expires resource URIs in join nodes after an X amount of time, irrelevant of the resource type, whilst for the last technique, these expire according to a time period assigned to the respective resource type. Events defining a context state [1] (e.g. current availability) are automatically invalidated and removed from the join nodes according to the changed state. 4.3 Processing the Network Algorithm Algorithm 1: Processing Rule Network to trigger rule(s) from detected events Data: Perceived Event E; Rule Network N; ResultSet S get resource type T for event E ; PL ← ksp(N,T) ; from PL find common subpaths Cp ; while ∃ path P in PL do if path P contains join node J then inputPath ← get left or right input of P in J ; if J = AndNode then inputPath.result ← execute(inputPath.query) ; checkEventJoinNode(N , inputP ath, J) ; continue iteration ; else if path execute(P.rule) 6= null then remove all paths in PL where P is not a subpath (Cp ) ; S.add(P.rule) remove all paths in PL where P is a subpath (Cp ) ; Procedure ode(N ,P ,J) checkEventJoinN- Input: Rule Network N ; InputPath P ; JoinNode J if J = SucceededByNode then if inputPath is the left input then inputPath.result ← execute(inputPath.query) ; else if left input result is not empty then if execute(inputPath.query) 6= null then S.add(P.rule) ; if J = PrecededByNode then if inputPath is the right input then inputPath.result ← execute(inputPath.query) ; else if right input result is not empty then if execute(inputPath.query) 6= null then S.add(P.rule) ; Lecture Notes in Computer Science 7 In contrast to the Rete Algorithm [5], the improved processor we introduce does not process the network by matching patterns, but aims to efficiently filter and find candidate rules. This is done by forming a subgraph, with the perceived resource type as the root node. Algorithm 1 shows how the processing of the network is done. Candidate rules are filtered (PL ) using Yen’s K-Shortest Path algorithm [10]. Common subpaths (Cp ) are also discovered during the execution of the k-shortest path algorithm. This has the benefit of reducing the number of queries performed to check for potential rules to be triggered. Once a set of paths is ordered, these are iterated and the query in the rule node is executed. Rules whose queries return a result (indicating a rule triggered) are stored in a set which is then passed to the action executor. The rule matcher runs its matching process until all paths (in PL ) are checked. At worst, the matcher iterates on all candidate paths found by the KSP algorithm. When a rule is matched, all other paths which do not have the same condition as the rule in question, are removed and not checked by the matcher. This approach is due to the non-repetitive nature of the network, where rules with the same condition are represented by the same branch in the network. 5 Evaluation Our evaluation serves two purposes: to compare the efficiency of the new networkbased approach to the sequential approach proposed earlier [3], and to investigate its scalability. We base our investigation on a study on the performance of event processing systems [6], which focus on the bottlenecks and the degrading of performance as the load is increased. The authors suggest benchmarking various factors, including the selection and pattern detection. The key performance aspects to prove the scalability of the proposed event processor are: – The initialisation of the Rule Network; – The selection of rules on event detection; – The maximum load of events the event processor can handle. For these tests, we require a knowledge base (KB), DRMO rule instances and event data. As a KB, we use a sample di.me PIM containing data based on various OSCAF8 ontologies. Test rules were manually created. In the context of di.me, we limit the number of recursive conditions (defined in Section 3 - equation (1)) to 2. For these evaluation tests no repeated rules were used so that the Rule Network event processor is not given an advantage over the Sequential. We also propose the following limits: 50 different rules each having 5 conditions, based on the sample PIM and following a small user study. For the generation of the event data we developed an event generator that generates realistic9 daily events based on the given PIM and scenarios. The evaluation tests have been carried out on a Macbook Pro, Intel Core i5 with 2.4Ghz processor speed and 4GB of RAM. Data is stored in-memory using Sesame10 . 8 9 10 http://www.oscaf.org All data and evaluation results can be found: http://smile.deri.ie/Rule-Network-Evaluation http://www.openrdf.org 8 5.1 Lecture Notes in Computer Science Initialisation Test The aim of this test is to measure the time taken to register rule instances in both event processors (sequential vs rule network). We initialise our event processors ten times each, starting with five rules and progressively incrementing the number of loaded rules by five. Figure 5 shows the time taken (in milliseconds), against the number of rules loaded in the event processors with each initialisation. The graph shows that the behaviour of the time consumed in relation to the number of rules loaded in the system is linear. The rule network event processor consistently took more time to initialise than the sequential event processor, due to the extra overhead needed to update the network. Fig. 5. Time taken to initialise Event Processors 5.2 Fig. 6. No. of distinct events perceived over time Filtering Test In this test we measure the time taken (in ms) by the processors to select candidate rules with each event perceived. We use data generated over 24-hours, thus assuming that events in the event log (required only for the sequential event processor) remain valid throughout this period. After loading all 50 rules in both processors, the test was repeated four times, each time producing a number of events until the desired number of distinct types were available on the event log. For the experiment we require that the filtering algorithm returns candidate rules satisfying either of the following criteria: 1. A rule with one condition where the associated resource type (drmo:hasResourceType) is the same as the perceived event’s type; 2. A multiple-condition rule such that one of it’s conditions is satisfied by the type of the new event. The graph in Figure 6 shows how the sequential process grows in time as the number of distinct event types increase in the event log, whilst the network filtering process remains constant. This result was expected, since the sequential’s filtering process needs to go through all the rules to check their resource types against the distinct event types on the event log, in order to satisfy either of the mentioned criteria. On the other hand the network is built such that the associated rules can be automatically found by forming a sub-graph, using the resource type as the start vertex. The network filtering process takes time in order to sort the possible rules by shortest path first, allowing simple rules to be evaluated before the more complex ones. Lecture Notes in Computer Science 5.3 9 Load Testing The aim of this experiment is to understand to which extent the event processors can work sufficiently and in an acceptable manner. For this experiment we created a producer/consumer service, where the producer sends a stream of generated events to the consumer (the event processor), which filters and triggers rules simultaneously. After both processors were loaded with the 50 rules, we ran this test five times, progressively increasing the event load up to a maximum 16000 events. We established that an acceptable time-frame for the processing of consumed events should be less than 20 seconds. Since the sequential process failed this test outright, with 100 events being processed in 92.48 seconds, it was subsequently eliminated from this experiment. Figure 7 shows time time taken (in seconds) to perform all consumed events for the rule network event processor. In particular, we observe that 10000 events were consumed and processed in around 2 seconds. In the di.me userware, we do not foresee a user to have anything near 10000 events being perceived at the same time. More realistically, we observe that the rule network event processor can process up to 2000 events in a reasonable time (≈ 0.4 sec), with 100 events being processed in less than 0.1 sec. Fig. 7. Testing time taken to process events From these results we conclude that although during the initialisation stage the Rule Network event processor takes longer than the Sequential event processor, the runtime process for filtering and triggering takes considerably less time, regardless of the load of events being consumed by the processor. The trade-off between the time taken to initialise the rule, and to process events, is justified for the following reasons: 1. The initialisation process is done only once; 2. The difference in milliseconds between both event processors is almost insignificant (to initialise 50 rules, the network takes < 100ms more); 3. The filtering of events takes considerably less time in the network event processor; 4. Even with a load of 10000 events, the rule network performs within a reasonable time-frame. 6 Conclusion In this paper we describe an ontology-driven event processor which operates on personal, activities and context; as perceived by various devices and online sources. The 10 Lecture Notes in Computer Science event processor compares event streams to the antecedent of declarative rules, defined by a user through an intelligent UI. Prospectively, a JSON-LD11 serialisation would enable us to interchange DRMO instances, ensuring that the full semantics are retained event at the UI level. Rules consist of various conditions corresponding to items in a unified PIM, and one or more resultant actions. Due to the nature of the rule conditions and their constraints, we propose to represent them by a rule management ontology. This also means that in theory, the personalised user-defined rules can be processed by various platforms. In the future, we also intend to investigate the possibility to enable the automatic rule learning and discovery based on the availability of a user’s context history. Defined rules are transformed and registered in a Rule Network, which is then operated upon by the event processor. After initialising the rule network, the system is ready to start perceiving events from multiple datasources. With each new event, the rule network filters candidate rules and triggers any which are satisfied. The proposed network-based event processor is compared to an earlier sequential approach. Evaluation results show that although the rule network takes more time at the initialisation stage, it performs considerably better than the alternative, during both the filtering and triggering processes. In particular, our event processor can process 100 events in less than 0.1 seconds. References 1. J. Attard, S. Scerri, I. Rivera, and S. Handschuh. Ontology-based situation recognition for context-aware systems. In I-SEMANTICS 2013, 2013. 2. V. Beltran, K. Arabshian, and H. Schulzrinne. Ontology-based user-defined rules and context-aware service composition system. In Proceedings of the 8th international conference on The Semantic Web, ESWC’11, Berlin, Heidelberg, 2012. Springer-Verlag. 3. J. Debattista, S. Scerri, I. Rivera, and S. Handschuh. Ontology-based rules for recommender systems. In Proceedings of the International Workshop on Semantic Technologies meet Recommender Systems & Big Data, 2012. 4. S. Decker, S. Melnik, F. van Harmelen, D. Fensel, M. Klein, J. Broekstra, M. Erdmann, and I. Horrocks. The semantic web: the roles of xml and rdf. Internet Computing, IEEE, 4(5):63–73, 2000. 5. C. L. Forgy. Rete: A fast algorithm for the many pattern/many object pattern match problem. Artificial Intelligence, 19(1):17 – 37, 1982. 6. M. R. N. Mendes, P. Bizarro, and P. Marques. A performance study of event processing systems. In TPCTC, pages 221–236, 2009. 7. S. Scerri, A. Schuller, I. Rivera, J. Attard, J. Debattista, M. Valla, F. Hermann, and S. Handschuh. Interacting with a context-aware personal information sharing system. In Proceedings of the 15th International Conference on Human-Computer Interaction (HCI2013), 2013. 8. M. Sintek, S. Handschuh, S. Scerri, and L. van Elst. Technologies for the social semantic desktop. In Reasoning Web. Semantic Technologies for Information Systems, volume 5689 of Lecture Notes in Computer Science, pages 222–254. Springer Berlin / Heidelberg, 2009. 9. K. Walzer, T. Breddin, and M. Groch. Relative temporal constraints in the rete algorithm for complex event detection. In DEBS, pages 147–155, 2008. 10. J. Y. Yen. Finding the K Shortest Loopless Paths in a Network. Management Science, 1971. 11 http://json-ld.org