Event-Condition-Action Rule Languages on Semistructured Data

advertisement
Event-Condition-Action
Rule Languages over
Semistructured Data
George Papamarkos
Outline
 What Event-Condition-Action (ECA) Rules
are and what we can do with them?
 ECA Rules for XML
 ECA Langugage
 System Architecture
 Performance
 ECA Rules for RDF
 ECA Langugage
 System Architecture
 Performance
13/10/2006
2
What is an ECA Rule?
 An Event-Condition-Action rule performs
actions in response to events, given that a
stated condition holds
 An event in a database system can be the
insertion of a new tuple
 The condition can be a query
 The action may be a relational table update
 This behaviour is called reactive
functionality
13/10/2006
3
What is an ECA Rule?
 An ECA rule has the general syntax:
on event if condition do action
 The event part specifies when the rule is
triggered
 The condition part determines if the data
are in a particular state, in which case the
rule fires
 The action part describes the actions to be
performed if the rule fires.
13/10/2006
4
Advantages of using ECA Rules
 Allow applications reactive functionality to be
defined and managed within a single rule base
rather than being encoded in the programs
 Use of a high-level declarative syntax and are
thus amenable to analysis and optimisation
techniques that cannot be applied if the
functionality was encoded in the programming
code
13/10/2006
5
Outline
 What Event-Condition-Action (ECA) Rules
are and what we can do with them?
 ECA Rules for XML
 ECA Language
 System Architecture
 Performance
 ECA Rules for RDF
 ECA Langugage
 System Architecture
 Performance
13/10/2006
6
ECA Rules for XML - Outline
Design issues of an ECA language
for XML
The XTL Language
Implementing an XTL rules
processing system
Performance Study
13/10/2006
7
Design issues of an ECA
language for XML
 Comparing with relational triggers the following are
the most important XML-specific issues on
designing an ECA language for XML
 Event Granularity: Specifying the granularity of
where data has be modified is more complex and
requires path expressions
 Action Granularity: Action may affect an entire subdocument meaning that:
 An action can trigger a different set of events
 The analysis of which events are triggered by an action
cannot be based on syntax alone
13/10/2006
8
The XTL Language
 The general syntax of XTL rules is:
on event if condition do action
 Fragments of XPath and XQuery are used
to specify the event, condition and action
parts of XTL rules.
 XPath is used for selecting and matching
fragments of XML
 XQuery is used withing actions where it is
needed to construct a new XML fragment
13/10/2006
9
The XTL Language
 Event Part
 Syntax:
(INSERT | DELETE) e
where e is an XPath expression evaluating to a set
of nodes.
 A rule is triggered if this set of nodes includes any
node in the XML fragment inserted or deleted
 The system-defined variable $delta contains this
set of nodes and is available for use in condition
and action part of the rule
13/10/2006
10
The XTL Language
 Condition Part
 The condition part is either the constant
TRUE or one or more XPath expressions
connected by the boolean connectives and,
or, not.
 Each of these expressions is evaluated on
the data to tell whether the condition is
TRUE or FALSE
13/10/2006
11
The XTL Language
 Action Part:
 The action part is a sequece of one or more
actions
 Syntax:
13/10/2006
 INSERT r BELOW e (BEFORE | AFTER) q
r is an XQuery expression specifying the XML fragment to
be inserted, e is an XPath expression specifying the set
of nodes under which the new fragment will be inserted,
q is either a constant or an XPath qualifier specifying the
set of nodes BEFORE or AFTER which the new nodes
will be placed.
 DELETE e
e is an XPath expression specifing the set of nodes to be
deleted.
12
XTL Language
Example rule:
ON INSERT doc(‘s.xml’)/shares/share/dayinfo/prices/price
IF $delta > $delta/../../high
DO DELETE $delta/../high;
INSERT <high>$delta/text()</high>
BELOW $delta/../.. AFTER prices
13/10/2006
13
XTL rule processing system
13/10/2006
14
XTL rule processing system Architecture
ECA Rules Management: Validates and
registers a rule to the Rule Base
ECA Rule Processing Engine:
 Evaluates the Event and Condition Parts of
the rules and schedules their actions for
execution in the Action Schedule
13/10/2006
15
System Performance
 The system performance was studied by:
 Developing an analytical model of the system
 Performing experiments in the actual system
 We have studied the effects of rule base
indexes in system performance
 Performance criterion:
 Update response time: The mean time taken to
complete all rule execution resulting from a single
update submitted by a top-level update transaction
13/10/2006
16
System Performance
Varying quantities:
 Number of rules in the rule base
Experiments on the actual performed
with three (3) different rule sets
XML data set: a fragment of DBLP
database
13/10/2006
17
System Performance - Analytical
Model
 The analytical model is a mathematical
description of the system behaviour
 Uses queue theory to simulate the transaction
queues and database processing
 Uses a set of simplifying assumptions to
emulate the behaviour of some system
parameters (e.g. triggering probability,
transaction arrival rate etc.)
13/10/2006
18
System Performance - Analytical
Model Results
13/10/2006
19
System Performance - Analytical
Model
 Response time increases non-linearly for as
long as the system is stable (I.e. arrival rate in
the transaction queue is less that the service
rate)
 After the stability point the transaction queue
grows uncontrollably large, flooding the
memory and slowing it down
 Reasons:
13/10/2006
 Everything served by a single queue
 High number of event query evaluations to find
what is triggered
20
System Performance Experimental Results
13/10/2006
21
System Performance Experimental Results
Difference with Analytical Model due to:
 implementation choices (use of DOM etc.)
and
 the simplification assumptions made in the
analytical model
13/10/2006
22
System Performance
13/10/2006
23
System Performance - Indexing
Rule Base
13/10/2006
24
System Performance - Indexing
Rule Base
Better overall behaviour and scalability
characteristics due to smaller number of
rules that need to be checked for
triggering
Smaller number of rules checked -->
smaller number of queries need to be
evaluated
13/10/2006
25
Outline
 What Event-Condition-Action (ECA) Rules
are and what we can do with them?
 ECA Rules for XML
 ECA Langugage
 System Architecture
 Performance
 ECA Rules for RDF
 ECA
 Performance Langugage
 System Architecture
13/10/2006
26
ECA Rules for RDF
The RDFTL ECA Language
Implementing RDFTL processing system
in P2P environments
System performance
13/10/2006
27
The RDFTL Language
We have designed the language from
scratch specifically for RDF
General Syntax:
 ON event IF condition DO action
13/10/2006
28
The RDFTL Language
 Event Part:
13/10/2006
 May contain let expressions of the form: LET $var := e
 (INSERT | DELETE) e
e is a path expression that evaluates on a set of RDF
nodes. Catches the insertion or deletion of a node
 (INSERT | DELETE) triple
triple is an expression of the form (source,arc, target)
specifying an RDF triple. Catches the insertion or
deletion of a property in an RDF triple.
 UPDATE upd_triple
upd_triple is an expression of the form (source, arc,
old_target->new_target). Catches the update of a
property from one RDF node to another.
29
The RDFTL Language
 Condition Part:
 It is a boolean-valued expression
 May consist of conjunctions, disjunctions and
negations
 May also contain let expressions
 The $delta variable bound to the set of nodes or
arcs modified and caught by the event part
 Action Part:
 A sequence of actions
 Each action has similar syntax with the event part
13/10/2006
30
RDFTL Rules in P2P Environments
System Architecture
13/10/2006
31
RDFTL Rules in P2P Environments
 Each peer (P) is supervised by a superpeer
(SP)
 The set of Ps supervised by an SP form a
peergroup
 At each SP there is an RDFTL processing
engine installed
 Each P or SP hosts a fragment of the RDF
schema that may change due to updates
 Hybrid fragmentation with possible replication
13/10/2006
32
RDFTL Rules in P2P Environments
Ps notify the SPs for any updates on
their local data
An ECA rule generated at one P or SP
may be replicated, triggered, evaluated
or executed in different sites in the net.
13/10/2006
33
Distributed Rule Registration
 A rule generated is sent from P to SP for validation
and storage
 From there it is sent to all other SPs
 A replica of it will be stored also to those SPs that are
e-relevant to the rule. I.e. the event part queries of a
rule can be evaluated on SP
 At each SP each rule is annotated with IDs of local
peers that are e-, c- and a-relevant to the rule
 c- and a- relevance have a similar meaning with erelevance for the condition and action part
13/10/2006
34
Distributed Rule Execution
 Each SP manages its own rule execution schedule
 Each execution schedule is a sequence of updates to
be executed on the local peergroup
 Once an update u occurs in P, SP is notified
 SP determines if u may trigger any rule whose event
part is annotated with P’s ID.
 If yes, the event query is sent to P for evaluation
 If the rule is triggered, its condition will be evaluated
 If the condition is true SP will send each instance of r’s
action part to local peers that are a-relevant to it
13/10/2006
35
System Performance
 The system performance was studied by:
 Developing an analytical model of the system
 Developing a system simulator and performing
experiments with it
 Performance criterion:
 Update response time: The mean time taken to
complete all rule execution resulting from a single
update submitted by a top-level update transaction
13/10/2006
36
System Performance
 Cases studied with both the Analytical Model
and the Simulator :
 Random Network topology between SPs, with
various data replication degree
 HyperCup Network topology between SPs, with
various data replication degree
 Varying quantities:
 Number of peergroups
 Number of rules
13/10/2006
37
System Performance
Random topology - Replication 10%
Analytical Model
13/10/2006
Simulation
38
System Performance
With random topology system does not
scale well even with low replication and
number of rules and peergroups
Exponential update response time
System becomes unusable due to high
load
13/10/2006
39
System Performance
 HyperCup organises the SPs into hypercubes
 HyperCup topology guarantees that:
 Each peer receives a message only once
 A total number of N-1 hops is necessary to
broadcast a message to N peers
 The more distant peers are reached after log2N
hops
13/10/2006
40
System Performance
HyperCup - Replication 10%
Analytical Model
13/10/2006
Simulation
41
System Performance
HyperCup - Replication 90%
Analytical Model
13/10/2006
Simulation
42
System Performance
 With HyperCup we achieve higher
performance for various replication levels and
number of peergroups
 System scales better
 System remains stable and the update
response time within acceptable values
 Analytical with simulation approach show
good agreement
13/10/2006
43
Conclusions
 We have described two ECA languages for
XML and RDF
 We have studied and defined the architectural
characteristics for an ECA rule processing
system in centralised and distributed
environment
 We have conducted a study to determine the
system performance in both the centralised
and distributed case
13/10/2006
44
Conclusions
The whole study shows that ECA rules is
a usable technology for various different
application environments over semistructured data
13/10/2006
45
Thank you !!
13/10/2006
46
Download