Siddhi-CEP-report

advertisement
CS 4200 - FINAL YEAR PROJECT REPORT
SIDDHI-CEP
By
Project Group – 04
Suhothayan S. (070474R)
Gajasinghe K.C.B. (070137M)
Loku Narangoda I.U. (070329E)
Chaturanga H.W.G.S (070062D)
Project Supervisors
Dr. Srinath Perera
Ms.Vishaka Nanayakkara
Coordinated By
Mr. Shantha Fernando
THIS REPORT IS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE
AWARD OF THE DEGREE OF BACHELOR OF SCIENCE OF ENGINEERING AT UNIVERSITY OF
MORATUWA, SRI LANKA.
3rd of September 2011
Abstract
Project Title : Siddhi-CEP - High Performance Complex Event Processing Engine
Authors
: Suhothayan Sriskandarajah - 070474R
Kasun Gajasinghe - 070137M
Isuru Udana Loku Narangoda - 070329E
Subash Chaturanga - 070062D
Coordinator : Dr. Shantha Fernando
Supervisors : Dr. Srinath Perera
Mrs. Vishaka Nanayakkara
During the last half a decade or so, Complex Event Processing (CEP) is one of the most
rapidly emerging fields. Due to massive amount of business transactions and numerous new
technologies like RFID (Radio Frequency Identification), it has now become a real challenge
to provide real time event driven systems that can process data and handle high input data
rates with near zero latency.
The basic functionality of the complex event processor is to match queries with events and
trigger a response. These queries describe the details about the events that the system needs to
search within the input data streams. Unlike traditional systems like Relational database
Systems which are operating with hundreds of queries running for short durations against
stored static data, event driven systems operate with stored queries running constantly against
extremely dynamic data streams. In fact, an event processing system is an upside down view
of a database. The tasks of CEP are to identify meaningful patterns, relationships and data
abstractions among unrelated events and fire an immediate response.
Siddhi is an Apache-2.0 Licensed Complex Event Processing Engine. It addresses some of
the main concerns of event processing world where there is absolute need to have a opensource variant with the ability of processing huge flood of events that may go well over one
hundred thousand events per second with a near-zero latency. This needed careful design of
generic concepts of a CEP. Siddhi was designed after doing an in-detail literature review
focusing on each and every concept separately. Current Siddhi implementation provides an
extendable, scalable framework for the open-source community for extending Siddhi to
match specific business needs.
Table of Figures
Figure 1 Maven Structure ........................................................................................................ 16
Figure 2 Some CEP engines and their brief distributions with their time line ........................ 19
Figure 3 S4 architecture ........................................................................................................... 20
Figure 4 Esper architecture ...................................................................................................... 21
Figure 5 PADRES broker network ........................................................................................ 22
Figure 6 PADRES router architecture .................................................................................... 22
Figure 7 Aurora Pipeline Architecture ..................................................................................... 26
Figure 8 Siddhi high-level architecture.................................................................................... 33
Figure 9 Siddhi Class Diagram ................................................................................................ 35
Figure 10 Sequence Diagram ................................................................................................... 37
Figure 11 Siddhi Sequence Diagram ...................................................................................... 37
Figure 12 Siddhi Implementation View .................................................................................. 38
Figure 13 Siddhi Process View ................................................................................................ 40
Figure 14 Siddhi Deployment View ....................................................................................... 42
Figure 15 Siddhi Use Case Diagram ........................................................................................ 43
Figure 16 Siddhi Event Tuple .................................................................................................. 45
Figure 17 Siddhi Pipeline Architecture.................................................................................... 46
Figure 18 Map holding Executor Listeners ............................................................................. 49
Figure 19 Simple Siddhi Query ............................................................................................... 53
Figure 20 Siddhi Query Form UI Implementation Used in OpenMRS ................................... 54
Figure 22 Siddhi Time Window .............................................................................................. 56
Figure 21 Siddhi Query with Simple Condition ...................................................................... 56
Figure 23 Siddhi Time Window Query ................................................................................... 57
Figure 24 Siddhi Batch Window.............................................................................................. 58
Figure 25 Siddhi Unique Query ............................................................................................... 60
Figure 26 Scrum Development Process ................................................................................... 61
Figure 27 Test-Driven Development (TDD) .......................................................................... 62
Figure 28 Call Tree for the Time Window in JProfiler ........................................................... 67
Figure 29 The Memory Usage of Siddhi for a Time Window query....................................... 68
Figure 30 Siddhi Benchmark ................................................................................................... 69
Figure 31 DocBook Documentation Configuration ................................................................. 71
Figure 32 Siddhi Web Site ....................................................................................................... 72
Table of Graphs
Graph 1 Siddhi Vs Esper Simple Filter Comparison ............................................................... 75
Graph 2 Siddhi Vs Esper Average over Time Window Comparison ...................................... 75
Graph 3 Siddhi Vs Esper State Machine Comparison ............................................................. 76
Table of Tables
Table 1 Comparison between Database Applications and Event-driven Applications ........... 11
Table 2 Different Every Operator Cases .................................................................................. 48
Table of Contents
Abstract ...................................................................................................................................... 2
Table of Figures ......................................................................................................................... 3
Table of Graphs.......................................................................................................................... 4
Table of Tables .......................................................................................................................... 4
1.
2.
INTRODUCTION .............................................................................................................. 9
1.1.
Complex Event processing .......................................................................................... 9
1.2.
Aims and Objectives ................................................................................................. 10
LITERATURE SURVEY................................................................................................. 11
2.1.
Background ............................................................................................................... 11
2.1.1.
What is Complex Event Processing? ..................................................................... 11
2.1.2.
Why Complex Event Processing? ......................................................................... 12
2.1.3.
CEP General Use Cases ......................................................................................... 12
2.2.
Terminology .............................................................................................................. 13
2.3.
Tools & Technology studies ..................................................................................... 14
2.3.1.
Compiler Generators.............................................................................................. 15
2.3.1.1.
2.3.2.
ANTLR .............................................................................................................. 15
Building and Project Management tools ............................................................... 15
2.3.2.1.
Apache Maven ................................................................................................... 15
2.3.2.2.
Apache ANT ...................................................................................................... 17
2.3.3.
Version Control Systems ....................................................................................... 18
2.3.3.1.
2.4.
Subversion.......................................................................................................... 18
CEP Implementation Related Study.......................................................................... 18
2.4.1.
Some Well Known CEP Implementations ............................................................ 20
2.4.1.1.
S4 [3] [4] ............................................................................................................ 20
2.4.1.2.
Esper/NEsper [7] [6] .......................................................................................... 21
2.4.1.3.
PADRES [6] [8] ................................................................................................. 22
2.4.1.4.
Intelligent Event Processor (IEP) [6] [10] ......................................................... 23
2.4.1.5.
Sopera [6] [11] ................................................................................................... 23
2.4.1.6.
Stream-based And Shared Event Processing (SASE) [12] ................................ 23
2.4.1.7.
Cayuga [6] [14] [15] .......................................................................................... 25
2.4.1.8.
Aurora and Borealis [6] [20] [21] [15] .............................................................. 26
2.4.1.9.
TelegraphCQ [6] [15] [26] ................................................................................. 27
2.4.1.10.
STREAM [6] [33] .............................................................................................. 28
2.4.1.11.
PIPES ................................................................................................................. 29
2.4.1.12.
BEA WebLogic [15] .......................................................................................... 29
2.4.1.13.
Coral8 [15] ......................................................................................................... 30
2.4.1.14.
Progress Apama [15].......................................................................................... 30
2.4.1.15.
StreamBase [15] ................................................................................................. 31
2.4.1.16.
Truviso [15] [41] ................................................................................................ 31
2.5.
2.5.1.
Event Stream Processing with Out-of-Order Data Arrival [42] ............................ 31
2.5.2.
Efficient Pattern Matching over Event Streams [43]............................................. 32
2.6.
3.
Some Interesting Research Papers ............................................................................ 31
What We Have Gained from the Literature Survey .................................................. 32
SIDDHI DESIGN ............................................................................................................. 33
3.1.
Siddhi Architecture ................................................................................................... 33
Input Adapters .................................................................................................................. 33
Siddhi- core....................................................................................................................... 34
Output Adapters ................................................................................................................ 34
Compiler ........................................................................................................................... 34
Pluggable UI ..................................................................................................................... 34
3.2.
3.2.1.
4+1 Model ................................................................................................................. 35
Logical View ......................................................................................................... 35
Class Diagram................................................................................................................... 35
Sequence Diagram ............................................................................................................ 36
Implementation View ....................................................................................................... 38
Process View .................................................................................................................... 39
Deployment View ............................................................. Error! Bookmark not defined.
Use Case View.................................................................................................................. 43
Use case 1. ........................................................................................................................ 43
Use case 2. ........................................................................................................................ 44
Use case 3. ........................................................................................................................ 44
3.3.
Major Design Components ....................................................................................... 45
3.3.1.
Event Tuples .......................................................................................................... 45
3.3.2.
Pipeline architecture .............................................................................................. 45
3.3.3.
State Machine ........................................................................................................ 47
Sequence Queries ............................................................................................................. 47
Every Operator ................................................................................................................. 47
Design Decisions in Sequence Processor ......................................................................... 48
Pattern Queries ................................................................................................................. 49
Kleene Star Operator ........................................................................................................ 50
Design Decisions in Pattern Processor ............................................................................. 51
3.3.4.
Processor Architecture ........................................................................................... 51
Executors .......................................................................................................................... 51
3.3.4.1.
Event Generators ................................................................................................ 52
3.3.5.
Query Object Model .............................................................................................. 53
3.3.6.
Query parser .......................................................................................................... 55
3.3.7.
Window ................................................................................................................. 56
Time Window ................................................................................................................... 56
Batch window ................................................................................................................... 57
Time Batch Window:........................................................................................................ 57
Length Batch Window: ..................................................................................................... 58
3.3.8.
4.
Implementation ................................................................................................................. 60
4.1.
Process models .......................................................................................................... 60
4.1.1.
Scrum Development Process ................................................................................. 60
4.1.2.
Test-driven development (TDD) ........................................................................... 62
4.2.
Version control .......................................................................................................... 63
4.3.
Project management .................................................................................................. 63
4.4.
Coding Standards & Best Practices Guidelines for Siddhi ....................................... 64
4.4.1.
General................................................................................................................... 64
4.4.2.
Java Specific .......................................................................................................... 66
4.5.
Profiling..................................................................................................................... 66
4.6.
Benchmark ................................................................................................................ 68
4.7.
Documentation .......................................................................................................... 70
4.8.
Web Site .................................................................................................................... 71
4.9.
Bug tracker ................................................................................................................ 72
4.10.
5.
“UNIQUE” Support ............................................................................................... 59
Distribution ............................................................................................................ 73
Results .............................................................................................................................. 74
5.1.
6.
Performance testing ................................................................................................... 74
Discussion and Conclusion ............................................................................................... 76
6.1.
Known issues............................................................................................................. 76
6.2.
Future work ............................................................................................................... 76
6.2.1.
Incubating Siddhi at Apache.................................................................................. 77
6.2.2.
Find out a query language for Siddhi .................................................................... 77
6.2.3.
Out of order event handling ................................................................................... 77
6.3.
Siddhi Success Story ................................................................................................. 77
6.4.
Conclusion................................................................................................................. 78
Abbreviations ........................................................................................................................... 80
Bibliography ............................................................................................................................ 81
Appendix A .............................................................................................................................. 84
1. INTRODUCTION
1.1. Complex Event processing
Data processing is one of the key functionality in computing. Data processing refers to a
process that a computer program does to enter data and then analyze it to convert the data
into usable information. Basically, data is nothing but unorganized facts and which can be
converted into useful information. Analyzing data, Sorting data and Storing data are a few
major tasks involved in data processing.
Data processing has a tight connection with Event-Driven Architecture (EDA). EDA can be
viewed as a subset data processing where, in EDA, a stream of data (can be called as events)
is processed. One might think an Event is just another data which has a time-stamp. But an
Event has a broader meaning. One of the better ideas about the relationship between data and
events is “Data is derived from the Events”. So the Events are different from data. Actually
the event representation contains the data.
During the last half a decade, Complex Event Processing (CEP) has been one of the most
rapidly emerging fields in data processing. Due to the massive amount of business
transactions and numerous new technologies like RFID (Radio Frequency Identification), it
has now become a real challenge to provide real time event driven systems that can process
data and handle high input data rates with near zero latency (nearly real-time).
The basic functionality of the complex event processor is to match queries with events and
trigger a response immediately. These queries describe the details about the events that the
system needs to search for within the input data streams. Unlike traditional systems like
Relational database systems (RDBMS) which are operating with hundreds of queries running
for short durations against stored static data, event driven systems operate with stored queries
running constantly against extremely dynamic data streams. Actually, an event processing
system is the inverted version of a database where the search queries are stored in the system
and matched against incoming data. Hence, Complex Event Processing is used in systems
like data monitoring centers, financial services, Web analysis, and many more, where
extremely dynamic data is being generated.
In the abstract, the tasks of the CEP is to identify meaningful patterns, relationships and data
abstractions among unrelated events and fire an immediate response such as an Alert
message.
Examples:

Search a Document for a specific key word

Radio Frequency Identification (RFID)

Financial market transaction pattern analysis
1.2. Aims and Objectives
The main aim of our project is to implement a 100% open source high performance complex
event processing engine. There are several commercial and very few open-source CEP
engines currently available. Most of them were implemented early in this decade and now
they have become stable. Since they were implemented some time ago, there can be
improvements that can be done to those CEP implementations even at the architectural level.
But since they have become stable, we can see that there is no tendency to further improve
their system base. So our main aim is to identify those weaknesses and implement a better
open source CEP implementation on a latest JDK.

Carry out a literature survey, compare & contrast different implementations of Event
Processing Engines, and come up with an effective architecture that can process any type
of external event streams which is computationally efficient. The factors to look for are
the support for high speed processing with low memory consumption.

Implement the basic Complex Event Processing engine framework

Build up the planned features for the engine on top of the written framework

Do Java profiling in several iterations on the code base and improve the efficiency. This
makes sure that there are no/less overhead due to the code written
2. LITERATURE SURVEY
2.1. Background
Following sections will provide the details of the background of this project. It includes
describing what Complex Event Processing systems are, why there are need for CEPs, and
their general use cases.
2.1.1. What is Complex Event Processing?
Event processing can be defined as a methodology that performs predefined operations on
event objects including analyzing, creating, reading, transforming or deleting them.
Generally, Complex Event Processing (CEP) can be defined as an emerging technology that
creates actionable, situational knowledge from distributed message-based systems, databases
and applications in real time or near real time. In other words, a CEP software
implementation aggregates information from distributed systems in real time and applies
rules to discern patterns and trends that would otherwise go unnoticed. In another view, we
can identify a CEP as a database turned upside-down. That is because instead of storing the
data and running queries against those stored data, the CEP engine stores queries and run the
data through them as a stream. So, Complex Event Processing is primarily an event
processing concept that deals with the task of processing events from several event streams
and identifying the meaningful events within the event domain and fire when a matching was
found based on the rules provided. Basically, CEP uses techniques such as detection of
complex patterns of events, event hierarchies, and relationships between events such as
causality, timing etc.
Table 1 Comparison between Database Applications and Event-driven Applications
Query Paradigm
Database Applications
Event-driven Applications
Ad-hoc queries or
Continuous standing queries
requests
Latency
Seconds, hours, days
Milliseconds or less
Data Rate
Hundreds of events/sec
Tens of thousands of
events/sec or more
2.1.2. Why Complex Event Processing?
The IT industry is getting complex day by day, and the state of today’s industry can be
identified as an event driven era. In modern enterprise software systems, events are a very
frequent commodity. Thus, extraction of what is relevant and what is not can be a nightmare,
especially when there are thousands of changes taking place per second. So Complex Event
Processing is a new way to deal with applications where many agents produce huge amounts
of data per second and we need to transform data into reliable information in a short period of
time. These applications consist of massive amounts of data being produced per second
where traditional computing power (hardware and software) won’t have the enough capacity.
Therefore when we need to process massive amount of incoming events in real time the
classical methodologies will fail, where the CEP comes in to play.
For example, say you have to trace the increase of stock prices on a particular set of
companies. The traditional process was to put all of them in a database under a particular
schema and at the end of the day, go through the whole data base monitoring whether there
was an increase in stock price. Generally, there will be hundreds of such companies to track
and there will be thousands of buying and selling per second on the stock market. Therefore,
the database will be enormous. This is where the real need of a CEP engine arises, which
gives you real time feed back based on the specified business rules while saving both time
and space.
As explained, unlike traditional static data analysis methodologies, the CEP engines are
event-driven: the logic of the analysis is applied in advance, and each new event is processed
as soon as it arrives, immediately updating all high level information and triggering any rules
that have been defined. With CEP, businesses can map discrete events to expected outcomes
and relate a series of events to key performance indicators (KPIs). Through this CEP gives
businesses more insight into which events will have the greatest operational impact by
helping them to seize opportunities and mitigate risks.
2.1.3. CEP General Use Cases
In early age of Complex Event Processing Systems they were used for monitoring stock
trading systems, and many still believe that is the major use case of CEP. But in the present
days there are so many other interesting applications of CEP, especially across the IT
industry, Financial Markets and Manufacturing Organizations. They are as follows,
Cleansing and validation of data: CEPs can recognize patterns in the data and filter out
anomalous events which fall outside recognized parameters.
Alerts and notifications: CEP engines can monitor event streams and detect patterns, and
notify them by hooking into email servers; posting messages to web services and etc. (i.e. a
real time business system should be able to send notifications when problems occurred.)
Decision making systems: CEPs are used in automated business decision making systems
that take current conditions into its knowledge base.
Feed handling: Most CEP platforms come with many in-built feed handlers for common
market data formats.
Data standardization: CEP engines are capable of standardizing data of the same entity from
different sources within a common reference schema.
2.2. Terminology
This section covers a small set of basic terms related to event processing. However much
event processing technologies have progressed, still there is no standardized terminology
defined for even the basic terms such as ‘event’, ‘event stream’, ‘event processing’. The
definition varies slightly depending on the implementation and the product. There is an
ongoing effort to standardize this [1]. The definitions here give a generic idea of the terms,
and shows distinctions between different implementations.
The basic term that is excessively used in Complex Event Processing is ‘event’. And it is one
term that is misused most of the time. Basically, an ‘event’ can be defined as anything that
happens, or is contemplated as happening. However, the term is mostly used for the
representation of an event. The authors Event Processing Glossary [1] have generalized this
by defining two separate terms, i.e. ‘event’ and ‘event object’. The term ‘event object’ refers
to the representation of a particular event. This can be a tuple, vector, or row implementation.
Events can be two-fold, simple events or composite (complex) events. A simple event refers
only to an occurrence of a single event. A complex event is an aggregation of several events.
To understand this simply, let’s take the example of stock-market transactions. There, each
buying and selling of stock is a simple event. The simple event object may consist of the
stock symbol, the buying/selling price and volume. However, if we record all the buying and
selling of stocks of a specific company, and return them as one event, that event can be
considered as a complex event. In this case, it consists of a number of simple events [2].
The Event processing engine receives events through an event stream. Event stream is a
linearly ordered sequence of events which is ordered by the time of arrival. The usage of
event streams varies depending on the implementation. Some implementations allow event
streams of different types for events, while other implementations restrict it to be of a predefined type. Siddhi for example, restricts the event type of a given stream. User has the
ability to create different streams to send events of different type. This makes the
implementation clear and less confusing.
The processing of events comes with a variety of names. Mostly people call it ‘complex event
processing’ (CEP) or ‘event stream processing’ (ESP). CEP is the name we have used for
Siddhi throughout this report.
‘Query’ is a basic mean of specifying the rules/patterns to match in the incoming stream of
events. It specifies the needed event streams, the operations that need to be performed on
those events, and how to output the outcome of the operation. The outcome/output is also an
event, generally. Mostly we can regard this as a composite event if there are aggregatorfunctions specified in the query. The Query Language follows a SQL-like structure. There are
differences on the processing of the query, but the language contains many similarities. For
example, each of these SELECT, FROM, WHERE, HAVING, GROUP BY clauses intends
to say the same thing though the processing would be different. Different CEP
implementations use different query languages, and there is no standard for it. These query
languages extend SQL with the ability to process real-time data streams. In SQL, we send a
query to be performed on the stored data rows in a database-table. In here, the queries are fed
to the system before-hand, and the real-time streams of events are passed through these
queries performing the operations specified. The query will fire a new event when a match
for the rule/pattern occurs.
2.3. Tools & Technology studies
Following sections describes the tools and technologies we have used for providing the basic
infrastructure to develop Siddhi. This contains details about build management tools, version
controlling system, compiler generator etc.
2.3.1. Compiler Generators
As a part of our literature survey we looked at compiler generators. We have to construct a
compiler which generates the query object model from a query. One of the popular tools
which can be used to construct a compiler is ANTLR.
2.3.1.1. ANTLR
ANTLR which stands for “Another Tool for Language Recognition” is a tool which can be
used to create compilers, recognizers and interpreters. ANTLR provides a framework which
greatly supports tree construction, translation, error reporting and recovery. ANTLR provides
a single syntax notation for specifying both lexer and parser. This feature makes it easy for
the users to specify their rules. It also has a graphical grammar editor and a debugger called
ANTLRWorks which enhance the ease of usage further. ANTLR uses Extended Backus-Naur
Form (EBNF) grammars and supports many target programming languages like Java, C,
C++, C#, Ruby and Python. The parser generated by ANTLR is a LL(*) parser which
provides infinite look ahead. Since ANTLR is a top-down parser it uses Syntactic predicates
to resolve ambiguities such as left factoring which are not supported by native top-down
parsers.
2.3.2. Building and Project Management tools
2.3.2.1. Apache Maven
Apache Maven is a project management tool. Maven is developed to make the build process
much easier. Initially Maven was created to manage the complex build process of the Jakarta
Turbine project. Maven is rapidly evolving and though its newest version is 3, still version 2
is widely used.
Features of Maven
I.
II.
Maven understands how a project is typically built.
Maven makes use of its built-in project knowledge to simplify and facilitate project
builds.
III.
Maven prescribes and enforces a proven dependency management system that is in
tune with todays globalized and connected project teams.
IV.
Maven is completely flexible for power users; the built-in models can be overridden
and adapted declaratively for specific application scenarios.
V.
VI.
Maven is fully extensible for scenario details not yet covered by existing behaviors.
Maven is continuously improved by capturing any newfound best practices and
identified commonality between user communities and making them a part of
Maven's built-in project knowledge.
VII.
Maven can be used to create projects files from build files. By using commands
a. Creating an eclipse artifact for any source containing a build script.
mvn eclipse:eclipse
b. Creating an IntelliJ idea artifact using
mvn idea:idea
Figure 1 Maven Structure
Project object model (POM): The POM is a model for maven 2 which is partially built into
the maven main engine. Pom.xml which is a XML based metadata file is the build file which
has the declarations of the components.
Dependency management model: Dependency management is a key part of the maven. The
maven dependency management can be adapted to most requirements and its model is built in
to maven 2.This model is a proven workable and productive model currently deployed by
major open source projects.
Build life cycle and phases: These are the interfaces between its built-in model and the plugins. The default lifecycle has the following build phases.








validate - validate the project is correct and all necessary information is available
compile - compile the source code of the project
test - test the compiled source code using a suitable unit testing framework. These
tests should not require the code be packaged or deployed
package - take the compiled code and package it in its distributable format, such as
a JAR.
integration-test - process and deploy the package if necessary into an
environment where integration tests can be run
verify - run any checks to verify the package is valid and meets quality criteria
install - install the package into the local repository, for use as a dependency in
other projects locally
deploy - done in an integration or release environment, copies the final package to
the remote repository for sharing with other developers and projects.
Plug-ins: Most of the effective work of maven is performed using maven plug-ins. Following
is a part of a pom.xml file. Here we have used p2- feature plug-in to generate a p2 feature.
<plugins>
<plugin>
<groupId>org.wso2.maven</groupId>
<artifactId>carbon-p2-plugin</artifactId>
<version>1.1</version>
<executions>
<execution>
<id>p2-feature-generation</id>
<phase>package</phase>
<goals>
<goal>p2-feature-gen</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
2.3.2.2. Apache ANT
Apache ANT is another popular build tool. The acronym ANT stands for “Another Neat
Tool”. It is similar to other build tools like make and nmake. The major difference when
compared to other build tools is that ANT is written in Java. So ANT is very much suitable
for Java projects. ANT uses XML to specify the build structure. Apache ANT provides a rich
set of operations that we can use to write build scripts. ANT is widely used in the industry as
the universal build tool for Java projects. ANT targets can be invoked by simple commands.
To run an ANT target called foo one may just type ‘ant foo’ on the command prompt.
2.3.3. Version Control Systems
Most of the open source projects are not developed by a single developer. Usually the
projects are a team effort. Therefore there should be a way to manage the source code. That is
the task of a version control system. A version control system manages files and directories
over time.
2.3.3.1. Subversion
Subversion is a version controlling system which is distributed under Apache/BSD-style
Open Source license. It is a replacement for the CVS version controlling system. Subversion
can be used by people on different computers.
Everyone can modify the source code of a project at the same time. If someone has done
something incorrectly, we can simply undo those changes by looking into project history.
Subversion has an official API (which is not there in CVS). Subversion is written as a set of
libraries in C language. Although it was written in C it has language binding for many
programming languages. So Subversion is a very extensible version controlling system.
JavaHL is the Java language binding of Subversion. Though the default UI of the subversion
is a command line interface, there are many third party tools developed to provide better user
interfaces for different environments. For windows there is a client called Tortoise svn.
Subclipse and Subvise are two plugins for Eclipse IDE.
2.4. CEP Implementation Related Study
The following are some of the projects which have made some significant efforts in the same
research area. Some of these have not been completed and some others projects have not yet
released their CEP engine. This shows that even though CEP is relatively an old concept
there is still much significant work going on, and the there is no CEP that took over this
market.
The following are some CEP engines and their brief distributions with their time line.
Figure 2 Some CEP engines and their brief distributions with their time line
2.4.1. Some Well Known CEP Implementations
Let’s look at some of the well known Complex Event Processing engines in market with their
features and advantages & disadvantages.
2.4.1.1. S4 [3] [4]
S4 was created and released by Yahoo! This is a framework for "processing continuous,
unbounded streams of data." The framework, allows for massively distributed computation
over data that is constantly changing. This was initially developed to personalize search
advertising products at Yahoo! and Yahoo has now released this under Apache License v2.
The architecture of S4 resembles the Actors model [5], providing semantics of encapsulation
and location transparency, thus allowing applications to be massively concurrent while
exposing a simple programming interface to application developers. This design choice also
makes it relatively easy to reason about correctness due to the general absence of side-effects.
S4 was designed for big data with the ability to capable mining information from continuous
data streams using user defined operators. Though S4 design shares many attributes with
IBM’s Stream Processing Core (SPC) middleware [6] architecturally S4 have some
differences. S4 is believed to achieve greater level of simplicity due to its symmetry in its
design where all its nodes in the cluster are identical and
there is no centralized control. Further S4 is accomplished
by leveraging ZooKeeper [3] which is a simple and elegant
cluster management service that can be shared by many
systems in a data center.
The disadvantage of S4 is that it allows lossy failovers.
Where upon a server failure, processes are automatically
moved to a standby mode and the state of the processes at
the time of frailer are stored in local memory, and it allows
data loss during this handoff process. The state is
Figure 3 S4 architecture
regenerated using the input streams. Downstream systems
must degrade gracefully. Further any Nodes cannot be added or removed from a running
cluster.
2.4.1.2. Esper/NEsper [7] [6]
EsperTech [7] brings Complex Event Processing (CEP) to mainstream with an Open Source
approach, ensuring rapid innovation with quality productization, support and services for
mission critical environments, from SOA to eXtreme Transaction Processing deployments.
EsperTech runs on Java 5 or Java 6 JVM Fully embeddable.
A tailored Event Processing Language
(EPL) allows registering queries in the
engine, using Java objects (POJO,
JavaBeans) to represent events. A
listener class - which is basically also a
POJO - will then be called by the
engine when the EPL condition is
matched as events come in. The EPL
allows expressing complex matching
conditions
that
include
temporal
windows, and joining different event
Figure 4 Esper architecture
streams, as well as filtering and
sorting them. [7]
The internals of Esper are made up of fairly complex algorithms primarily relying on state
machines and delta networks in which only changes to data are communicated across object
boundaries when required.
Esper is available under the GNU GPL license (GPL also known as GPL v2). Esper and
NEsper are embeddable components written in Java and C#, these are not servers by
themselves but are designed to hook into any sort of servers, and therefore suitable for
integration into any Java process or .NET-based process including J2EE application servers
or standalone Java applications.
Esper has a pull query API. Events in Esper allow object representation and dynamic typing.
Esper features a Statement Object Model API, which is a set of classes to directly construct,
manipulate or interrogate EPL statements.
Esper also has a Commercial version, and the disadvantage of Esper is that its free version
does not contain a GUI management, editor or portal application. Esper also does not
currently have a server. Esper provides only a small number of key inputs and output adapters
through EsperIO, and provides an adapter framework.
2.4.1.3. PADRES [6] [8]
PADRES (Publish/Subscribe Applied to Distributed Resource Scheduling) is developed by
Middleware Systems Research Group (MSRG) and University of Toronto. This is an
enterprise-grade event management infrastructure that is designed for large-scale event
management applications. Ongoing research seeks to add and improve enterprise-grade
qualities of the middleware.
A publish/subscribe middleware [9] provides many benefits to enterprise applications.
Content-based interaction simplifies the IT development and maintenance by decoupling
enterprise
components.
The
expressive
PADRES
subscription
language
supports
sophisticated interactions among components, and allows fine-grained queries and event
management functions. Furthermore, scalability is achieved with in-network filtering and
processing capabilities.
Figure 5 PADRES broker network
Figure 6 PADRES router architecture
2.4.1.4. Intelligent Event Processor (IEP) [6] [10]
Intelligent Event Processor (IEP) is a product of CollabNet, Inc. This is an open source
Complex Event Processing (CEP) engine. IEP is a JBI Service Engine and is a part of the
Open ESB community. OpenESB is an open source project with the goal of building a worldclass Enterprise Service Bus. An ESB provides a flexible and extensible platform on which to
build SOA and Application Integration solutions.
2.4.1.5. Sopera [6] [11]
SOPERA is a complete and proven SOA platform, which is rigorously oriented to practical
requirements. Companies and organizations benefit from the SOA know-how integrated in
SOPERA during implementation of sophisticated SOA strategies.
SOPERA has the ability to predict failure of the business process by monitoring the event
patterns. SOPERA detects Patterns which are schema based and when it discovers a certain
schema of events that leads to a failure of the business process and if all of the events of the
pattern occurred within the time window, it fires a new complex event that alerts the staffer in
advance about a process failure in the future. This provides the ability to react proactively
2.4.1.6. Stream-based And Shared Event Processing (SASE) [12]
The goal of SASE research project conducted by UC Berkeley and University of
Massachusetts Amherst to design and develop an efficient, robust RFID stream processing
system that addresses the challenges in emerging RFID deployments, including the datainformation mismatch, incomplete and noisy data, and high data volume, and it enables realtime tracking and monitoring.
The paper [13] presented on SASE give insight to different algorithms used in their efficient
state machine implementation. SASE extends existing event languages to meet the needs of a
range of RFID-enabled monitoring applications. SASE supports High volume streams,
extract events from large windows which even spans up to 12 hours, and include flexible use
of negation in sequences, parameterized predicates, and sliding windows. This approach is
based on a new abstraction of CEP i.e., a dataflow paradigm with native sequence operators
at the bottom, and pipelining query-defined sequences to subsequent relational style
operators.
SASE language supports not only basic constructs such as sequence and negation that
existing event languages have, but also offers flexible use of negation in event sequences,
adds parameterized predicates for correlating events via value based constraints, includes
sliding windows for imposing additional temporal constraints, and resolves the semantic
subtlety of negation when used together with sliding windows. Unlike previous work that
focuses on complex event “detection” (i.e., only reporting that an event query is satisfied but
not how), SASE explicitly report what events are used to match the query. This significantly
increases the complexity of query processing.
SASE approach employs an abstraction of complex event processing that is a dataflow
(query-defined event sequence) paradigm with pipelined operators as in relational query
processing. As such, it provides flexibility in query execution, ample opportunities for
optimization, and extensibility as the event language evolves.
The paper [13] provides a comparison between SASE and relational stream processor,
TelegraphCQ (TCQ) [12], developed at the University of California, Berkeley. TCQ uses an
n-way join to handle an equivalence test over an event sequence. This certainly incurs high
overhead when the sequence length is high. Moreover, TCQ only considers equality
comparisons in joins. Therefore, temporal constraints for sequencing, e.g., “s.time > r.time”,
are evaluated only after the join. In contrast, SASE uses the NFA to naturally capture
sequencing of events, and the PAIS algorithm to handle the equivalence test during NFA
execution, yielding much better scalability.
SASE also has some limitations where SASE does not handle Hierarchy of complex event
types, where the output of one query cannot be used as an input to another. This assumes total
ordering of events, a known issue with this assumption arises in the scenario where a
composite event usually obtains its timestamp from one of its primitive events, when such
composite events are mixed together with primitive events to detect more complex events, the
assumption of total order on all events no longer holds. Further SASE language can be
extended to support aggregates such as count()and avg()but these have not yet been
implemented.
2.4.1.7. Cayuga [6] [14] [15]
This project is part of 2007 AFRL/IF/AFOSR Minigrant titled “User-Centric Personalized
Extensibility for Data-Driven Web Applications,” by James Nagy (AFRL/IFED) [16]. This
minigrant focuses on Cayuga as a stateful publish/subscribe system for use in a graphical
programming model (also being developed at Cornell) known as Hilda. An overview of both
systems can be found in the Minigrant Proposal.
Researchers at Cornell describe Cayuga as a general-purpose complex event processing
system [17]. The system can be used to detect event patterns in event streams. The Cayuga
system is designed to leverage traditional publication/subscription techniques to allow for
high scalability [18]. This leads to comparisons not only with other data stream management
systems, but also to publish/subscribe systems to demonstrate the applications and
capabilities of Cayuga. The Cayuga system architecture is designed to efficiently support a
large number of concurrent subscriptions. Its core components include a query processing
engine, an index component, a metadata manager, and a memory manager.
One of the most novel components of Cayuga is the implementation of the processing engine,
which utilizes a variation of nondeterministic finite automata [18]. However, the automata in
Cayuga are a generalization of the standard nondeterministic finite automata model. These
automata read relational streams, instead of a finite input alphabet. Also, the state transitions
are performed using predicates. The use of automata allows for the storing of input data and
new inputs can be compared against previously encountered events.
Cayuga requires users to specify their interests in the structured Cayuga Event Language
(CEL). Not every Cayuga query can be implemented by a single automaton. In order to
process arbitrary queries, Cayuga supports re-subscription. This is similar to pipelining – the
output stream from a query is used as the input stream to another query. Because of resubscription, query output must be produced in real time. Since each tuple output by a query
has the same detection time as the last input event that contributed to it, its processing (by resubscription) must take place in the same epoch in which that event arrived. This motivates
the Cayuga Priority Queue, where the only form of query optimization performed by the
Engine is to merge manifestly equivalent states events having the same time stamps to be
processed together and then updated to the automata’s Internal String Table (week referenced
hash table).
There is also research regarding a distributed implementation of Cayuga, known as
FingerLakes [19].
2.4.1.8. Aurora and Borealis [6] [20] [21] [15]
The primary goal of the Aurora project [20] is to build a single infrastructure that can
efficiently and seamlessly meet the requirements of demanding real-time streaming
applications. This project has been superseded by the Borealis [21] project.
Both Aurora and Borealis are described as general-purpose data stream management systems
[22] in the papers published by the creators at Brandeis University, Brown University, and
the Massachusetts Institute of Technology. The goal of the systems is to support various realtime monitoring applications. The overall system architecture of Aurora and Borealis is based
on the “boxes-and-arrows” process- and work-flow systems [23]. Data flows through the
system as tuples, along pathways, which are arrows in the model. The data is processed at
operators, or in the boxes. After the last processing component, they are delivered to an
application for processing [22].
Figure 7 Aurora Pipeline Architecture
There are three types of graphs used to monitor the Aurora and Borealis systems – latency
graphs, value-based graphs, and loss-tolerance graphs. Monitoring these graphs are several
optimizations that these systems are capable of carrying out to decrease system stress. The
primary optimizations are the insertion of processing boxes, moving processing boxes,
combining two boxes into a single, larger box, reordering boxes, and load shedding [22].
Where load shedding is one of the most important optimizations introduced in these systems,
which means that the number of tuples presented for processing are reduced to end the
overloaded states. In Aurora and Borealis systems, load shedding is done in a manner that
opting to drop the tuples relating to systems that are more tolerant of lost and missing data.
Borealis being the second generation system developed by Brandeis, Brown, and MIT [24]
they have improved and integrated the stream processing functionality of Aurora system, and
also integrated the distribution techniques of Borealis from a project known as Medusa [25].
It should also be noted that the Aurora team has now commercialized the Aurora project
through StreamBase [23].
2.4.1.9. TelegraphCQ [6] [15] [26]
TelegraphCQ was developed by the University of California at Berkeley, and it was designed
to provide event processing capabilities alongside the relational database management
capabilities by utilizing the PostgreSQL [27]. Since PostgreSQL is an open source database
they have modified its existing architecture to allow continuous queries over streaming data
[28]. TelegraphCQ focuses on the issues such as scheduling and resource management for
groups of queries, support for out-of-core data, allow variable adaptively, dynamic QoS
support, and parallel cluster-based processing and distribution. Further this also allows
multiple simultaneous notions of time, such as logical sequence numbers or physical time.
TelegraphCQ uses different types of windows to impose different requirements of the query
processor and its underlying storage manager [27]. One fundamental issue TelegraphCQ has
is to do with the use of logical (i.e., tuple sequence number) vs. physical (i.e., wall clock)
timestamps. If the former is used, then the memory requirements of a window can be known
as a priori, while in the latter case, memory requirements will depend on fluctuations in the
data arrival rate. Another issue related to memory requirements is to do with the type of
window used in the query. Consider the execution of a MAX aggregate over a stream. For a
landmark window, it is possible to compute the answer iteratively by simply comparing the
current maximum to the newest element as the window expands. On the other hand, for a
sliding window, computing the maximum requires the maintenance of the entire window.
Further the direction of movement and the “hop” size of the windows (the distance between
consecutive windows defined by for loop) also have significant impact on query execution.
For instance, if the hop size of the window exceeds the size of the window itself, then some
portions of the stream are never involved in the processing of the query.
There are several significant open problems in TelegraphCQ with respect to the complexity
and quality of routing policies: understanding how ticket based schemes perform under a
variety of workloads, and how they compare to (NP-hard) optimal schedule computations,
modifying such schemes to adjust the priority of individual queries, and evaluating the
feasibility (in terms of computational complexity and quality) of more sophisticated schemes.
Routing decisions could consume significant portions of overall execution time. For this
reason, two techniques play a key role in TelegraphCQ: batching tuples, by dynamically
adjusting the frequency of routing decisions in order to reduce per-tuple costs, and fixing
operators, by adapting the number and order of operators scheduled with each decision to
reduce per operator costs. Since TelegraphCQ has designed with a storage subsystem that
exploits the sequential write workload and with a broadcast-disk style read behaviour, queries
accessing data that spans memory and disk also raise significant Quality of Service issues in
terms of deciding what work to be dropped when the system is in danger of falling behind the
incoming data stream.
Currently the developers of TelegraphCQ are extending the Flux module of TelegraphCQ to
serve as the basis of the cluster-based implementation. Also has been a spate of work on
sharing work across queries are related to the problem of multi-query optimization which was
originally posed by Sellis, et al., [29] by the group at IIT-Bombay [30] [31] [32].
It should also be noted that though TelegraphCQ is licensed under BSD license there is also a
commercialized version of TelegraphCQ named the Truviso event processing system.
2.4.1.10. STREAM [6] [33]
STREAM is a Distributed Data Stream Management System, produced at Stanford
University [33]. The goal of STREAM is to be able to consider both structured data streams
and stored data together. The queries over data streams are issued declaratively, but are then
translated into flexible physical query plans. STREAM system includes adaptive approaches
in processing and load shedding, provides approximate answers, and also manipulates query
plans during execution.
In STREAM, queries are independent units that logically generate separate plans, but those
plans are then combined by the system and ultimately result in an Aurora-like mega plan.
One of the notable features of the STREAM system is its subscription language, known as the
Continuous Query Language (CQL). CQL features two layers – an abstract semantics layer
and an implementation of the abstract semantics. Here the implementation of the abstract
semantics uses SQL to express relational operations and adds extensions for stream-related
operations.
Currently STREAM has several limitations such as merging sub expressions with different
window sizes, sampling rates, or filters. This is because it’s handling resource sharing and
approximation separately. As the number of tuples in a shared queue at any time depends on
the rate at which tuples are added to the queue, and the rate at which the slowest parent
operator consumes the tuples, where when the queries with common sub expressions
produces parent operators to handle the tuples in different consumption rates, then it is
preferable not to use a shared sub plan, which the STREAM is not currently handing.
STREAM was released under BSD license and according to the STREAM homepage the
project has now officially wound down [33]. Now it is used as the base for the Coral8 event
processing engine.
2.4.1.11. PIPES
PIPES [34] is developed by University of Marburg. It’s a flexible and extensible
infrastructure providing fundamental building blocks to implement a data stream
management system (DSMS). PIPES cover the functionality of the Continuous Query
Language (CQL). The First and last public release of PIPES was done on 2004 under GNU
Lesser General Public License.
2.4.1.12. BEA WebLogic [15]
BEA Systems developed the WebLogic Real Time and WebLogic Event Server system
which focuses on enterprise-level system architectures and service integrations.
BEA WebLogic focuses on event-driven service-oriented architecture which provides a
complete event processing and event-driven service-oriented architecture infrastructure that
supports high-volume, real-time, event-driven applications. BEA WebLogic is one of the few
commercial offerings of a complete, integrated solution for event processing and serviceoriented architectures.
BEA WebLogic system includes a series of Eclipse-based developer tools for easy
development and some administration tools for monitoring throughput, latency, and to
monitor other statistics. As BEA WebLogic had been acquired by Oracle Corporation, Oracle
have released some non-programmatic interfaces to allow all interested parties to configure
queries and rules for processing event data.
2.4.1.13. Coral8 [15]
The Coral8 event processing tool is designed to process multiple data streams and hand
heterogeneous stream data. Coral8 has the capability of processing operations that require
filtering, aggregation, correlation (including correlation across streams), pattern matching,
and other complex operations in near-real-time [35] [36]. Coral8 Engine is composed of two
tools, which are the Coral8 Server and the Coral8 Studio, and Coral8 also comes with a
Software Development Kit (SDK) to perform further optimizations.
Coral8 Server is the heart of Coral8 [35] which provides clustering support [37]. Coral8
Server also includes features such as publication of status data stream that can be used to
monitor performance and activity of the server by providing Simple Network Monitoring
Protocol (SNMP) to be used by management consoles and monitoring frameworks.
Coral8 Studio provides an IDE-like interface which allows the Administrators to add and
remove queries, and input and output data streams. This uses a subscription language called
Continuous Computational Language (CCL) to mange queries.
2.4.1.14. Progress Apama [15]
The Progress Apama event stream processing platform [38] consist of several tools, including
an event processing engine, data stream management tools, event visualization tools, adapters
for converting external events into internal events, and some development tools.
The Apama technology was tested at the Air Force Research Laboratory by Robert Farrell
(AFRL/IFSA) [15] to disprove the marketing claims of Apama, relating to the throughput and
latency. These results have shown that Apama could process events at rates measured in
thousands of events per second [38].
2.4.1.15. StreamBase [15]
The StreamBase event processing engine is developed based on research from the
Massachusetts Institute of Technology, Brown University, and Brandeis University. This is
an improved version of the Aurora project [39].
StreamBase provides a Server and a Studio module [40]. The Server module is designed to be
able to scale from a single-CPU to a distributed system. The Studio is Eclipse-based and this
not only provides graphical (drag-and-drop) creation of queries, but also supports text-based
editing of the queries, which uses StreamSQL.
2.4.1.16. Truviso [15] [41]
Truviso is a commercial event processing engine that’s based on the TelegraphCQ project of
UC Berkeley. The most important feature of Truviso is that it supports a fully-functional
SQL, alongside a stream processing engine.
The queries of Truviso are simply standard SQL with extensions that add functionalities for
time window and event processing. In addition, the use of an integrated relational database of
Truviso allows for easy caching, persistence, and archival of data streams, for queries that
include not only real-time data, but also the historical data.
2.5. Some Interesting Research Papers
Siddhi has taken inspiration from some valuable research papers when designing its
architecture. We have read several papers, and then chose the algorithms that most suitable
for us, and perform better.
2.5.1. Event Stream Processing with Out-of-Order Data Arrival [42]
This paper provides an in-depth architectural and algorithmic view to manage arrival of out
order data (with respect to timestamp). This is because there’s a possibility for the CEP to
lose its accuracy in real time, due to network traffic and other factors. The system presented
here is very much similar to SASE [12] where it also uses stacks to handle Event arrivals.
Here they have provided this as a feature which can act upon out of order events, by enabling
all stacks at the beginning to maintain a clock to check for the out of order events arrivals
from their timestamp. They have also provided an algorithm to handle the identified out of
order events in this paper. This will be useful only for the projects having similar design to
SASE.
2.5.2. Efficient Pattern Matching over Event Streams [43]
This paper focuses on Richer Query Languages for efficient processing over Streams. Their
query evaluation framework is based on three principles: First, the evaluation framework
should be sufficient for the full set of pattern queries. Second, given such full support, it
should be computationally efficient. Third, it should allow optimization in a principle way.
To achieve the above they have tested using a Formal Evaluation Model, NFAb, which
combines a finite automaton with a match buffer. When testing SQL-TS, Cayuga, SASE+,
CEDR they have found SASE+ to be much richer and more useful.
2.6. What We Have Gained from the Literature Survey
Literature Survey has greatly helped us to understand the different implementations of
present CEP engines and also get to know about their pros and cons. Through this survey we
were able to come up with the best architecture to Siddhi-CEP by understanding how other
CEP engines are implemented.
From the literature review we found out pipelines will be the most appropriate model for
Event passing, and therefore we decided to build our core use producer consumer architecture
having an Aurora like structure, to obtain high performance [44]. Further there we also found
an interesting paper [45] to help us implement the Query Plan Management with will not only
improve efficiency but also allows approximation of data in Stream Management.
3. SIDDHI DESIGN
This chapter discusses all the major sections of our project in terms of system architecture
and design. We discuss the basic System Architecture in detail with the use of various
different diagrams such as architecture diagrams, use case diagrams, class diagrams, etc.
This document also discusses about system design considerations which we thought before
starting the implementation phase. Since the significant factor which makes this project
different from other projects is performance, and therefore in this section we discuss more on
how we have achieved performance through our system design and the problems we faced.
3.1. Siddhi Architecture
Figure 8 Siddhi high-level architecture
Input Adapters
Siddhi receives events through input adapters. The major task of input adapters is to provide
an interface for event sources to send events to Siddhi. Also Siddhi has several types of input
adapters where each accepts different types of events. For instance Siddhi accepts XML
events, POJO Events, JSON Events, and etc and convert all of those different types of events
in to a particular data structure for internal processing. This data structure is simply a tuple.
Siddhi- core
Siddhi-core is the heart of the Siddhi complex event processing engine. All the processing is
done at this component. Core consists of many sub components such as Processors and Event
Queues. As indicated in the diagram input events are placed on input queues, the processors
then fetch the events to process them, and after processing those events are places in to the
output queues. More details on Siddhi-core are given under the “Detailed System Design”.
Output Adapters
Detected complex events are notified to the appropriate clients through output adapters.
Compiler
This component takes the user queries, compiles them and builds the object model according
to the query. Mainly, the compiler validates the user entered query and then if get validated,
the intermediate query object model is created, which the Siddhi Engine/core can understand.
Pluggable UI
Pluggable UI can be used to display useful statistics and monitoring tasks. It is not an
essential component for processing events.
3.2. 4+1 Model
3.2.1. Logical View
Class Diagram
The class diagram of Siddhi is shown in the figure.
Figure 9 Siddhi Class Diagram

Siddhi Manager :
o This is the main manager class which will manage Queries, Input Adapters
and Output Adapters accordingly

Input Adapter & Output Adapter:
o The Input Adapter and Output Adapter are provided as interfaces for Events,
since they are customizable by the clients in order to handle various types of
Input and Output Events (XML, JSON, etc).

Event & Event Streams:
o Events are serializable, and the Event structure is dependent on the Event
Stream from which it is generated.

Dispatcher, Processor & Event Queue:
o Dispatchers will delegate Event task to each queries by passing Event to
various Event Queues, and the Processors will be doing the processing by
fetching the Events from the corresponding Event Queues.
Sequence Diagram
This show how the Queries, Input Adapters and Output Adapters are assigned, and how
Siddhi manager invokes the appropriate Event Dispatchers to pass the Events to the
corresponding Query and processes them. Receiving Event from the client, processing them,
and Firing the right Output Event are concurrently handled as depicted in figure 3.
Note:
Here
without
the
loss
of
generality,
two
QueryProcessors
and
two
SpecificQueryDispatchers are shown in order to show how several instances could be present
at the same time.
Figure
Figure
11 Siddhi
10 Sequence
Sequence
Diagram
Diagram
Implementation View
Figure 12 Siddhi Implementation View
For the ‘compiler’ component, we decided to use ANTLR as the parser generator and
compiler. The queries fed by users are passed through the compiler. ANTLR validates the
query and if there’s an error/anomaly in the query, it returns an error. Else, the query is
transformed to an intermediate object model and passed to the Siddhi-core.
For testing purposes, we are using JUnit as our testing framework. Since Siddhi development
is carried out as a Test-Driven Development (TDD) testing framework played a major role in
the development, where the whole system is tested for different kinds of inputs and the output
is observed to make sure the intended output is generated by the system.
The Siddhi system is currently hosted at the source repository at Sourceforge and is using
Subversion as its Version Controlling System (VCS).
Process View
Process View gives a detailed description of end to end event processing in Siddhi.
There are input adapters in Siddhi which listens to external event sources. Each incoming
events will be put in to a blocking queue (input). Where there is an event dispatcher which
dispatches the events to Siddhi Processor from the input blocking queue. Here, Siddhi
duplicates the input Events according to the number of queries which are interested in
processing those Events, suppose if there is only one query interested in the event it just
passes the event to the specific query without duplication. Thereafter, the queries (which are
the rules defined by the client, which is represented by a query processing element instance)
will process the event and according to the query and the type of events they may store them
for further processing or drop them. Whenever there is a need to fire an output event Siddhi
creates an output event and passes that to the output queue.
Figure 13 Siddhi Process View
Deployment View
Deployment view, also known as Physical view illustrates the system from a systemengineer’s perspective. Deployment diagrams shows the hardware used in the system, the
software that will be running in the system, and the middle-ware that connect the different
hardware systems together.
The above diagram shows the deployment diagram of Siddhi-CEP project. The threedimensional boxes represent nodes, either software or hardware. The hardware/physical
nodes are shown with the stereotype <<device>>.
The above deployment diagram depicts the deployment of Siddhi in an SOA environment.
Here the Siddhi processor is wrapped inside an Event Server. The Event Server exposes the
Siddhi to outside as a Web Service allowing other Web Services to send events to Siddhi
without any hassle. For Siddhi to have an Event Server, currently what we have in our minds
is to make Siddhi compatible with the current WSO2 Event Server implementation. This will
make a Siddhi more like a plug and run module.
Siddhi uses event streams to receive events from different streams/event types. The Event
Server wraps the ‘Siddhi input streams’ and exposes what is called ‘input topics’. Topics
define a high-level easier to understand Interface. This hides internal details of the Siddhi
Processor. Where the event type of the streams is not necessarily exposed by input topics, but
rather Input topics and streams could have a many-to-many mapping. Meaning one inputtopic can have many streams, and one stream can be mapped to many input-topics.
To configure the wrapping, user should:
 Define input topics
 Map them to an event input stream (This is inside Siddhi Engine)
 Define queries to be processed
 Define an output topic per query. Then map a stream to the output topic if applicable.
(optional)
Through this configuration the Event Server will be able to receive events from different
event sources, and process then using Siddhi-CEP. Siddhi then alerts the Event Server via its
output Event streams when an event ‘hit’ occurred, where then the Event Server notifies to
the relevant parties as defined.
Figure 14 Siddhi Deployment View
Use Case View
Figure 15 Siddhi Use Case Diagram
Use case 1.
This use case is of querying Siddhi-CEP.
Querying Siddhi includes three main things.
I.
Assigning Input adapters
There will be several Input adapters available in Siddhi, and the client can select one
according to the type of message the client is intended to send for processing. For example
the client can use the SOAP/XML message adapters to convert the XML to Event-tuples for
the Siddhi engine to process.
In situations when there is no adapter for the type of event, the client is intended to send, the
API is made flexible such a way that the client can write their own adapters and easily plugin to Siddhi.
II.
Assigning Output adapters
Output adapters send output events back to the client in the set format. The client can select
an existing output adapter according to the type of message he is intended to get as the output
message. When the client couldn't find suitable adapters he will be writing the callback
method with his own Output adapter and sets the reference to Siddhi.
III.
Submit EPL Query
Another part of querying is submitting EPL (Event Programming Language) queries. After
EPL query is received, Siddhi will compile the query and build the object model. This
generated object model is used to check for the matching incoming events.
Use case 2.
The second use-case is, Event Sources sending events to Siddhi. The format of the Event will
be set when the query is sent to the Siddhi engine. Here the Event Sources will call the
appropriate input adapter and send the Events through it.
Use case 3.
The third use-case is, Event Subscribers getting notified of matching events. This happens in
the form of a callback in the Siddhi Output Adaptors. The output format is defined when the
output event is sent from the Siddhi engine.
3.3. Major Design Components
This section covers the design of basic elements, to high-level design components in Siddhi.
3.3.1. Event Tuples
In Siddhi, the internal representation of an event is a Tuple. Tuple is a very basic data
structure for representing an event.
Stream
Id
Data 1
Data 2
Data 3
Figure 16 Siddhi Event Tuple
Design Alternatives:

Plain Old Java Object (POJO), XML
The reason for selecting Tuple as the internal data structure for an event:
In the initial stage of the project we design the Siddhi architecture choosing XML as the
internal representation of events. But later we moved into event tuples.
A Tuple is a very simple data structure. Retrieving data from a Tuple is very simple
compared to other alternatives. That helps Siddhi to process events faster minimizing the
overheads in accessing data in an event. That is the major reason for selecting Tuple as the
data structure for the internal event representation.
3.3.2. Pipeline architecture
The architecture of Siddhi depicts a producer consumer model. As described in the diagram
the input Events are placed in an Event Queue, from which the processors fetch one at a time
and process the events. Siddhi processors use Executors as their internal processing elements.
These executors will access each event and produce a Boolean output, as whether the event
matches or not. The Events not matched are simply discarded, and the matching events are
then processed by the processors accordingly. When a matching event has occurred according
to the query the processor either stores the event for further processing or create the
appropriate output event and place that in the Output Queue, which will be used by another
processor or to notify the end user as a notification.
This presents a simple many to many producer consumer design pattern by developing
pipeline architecture according to the code books.
Figure 17 Siddhi Pipeline Architecture
Design Alternatives:

Single processor model

Parallel processing model

Pipeline architecture model
The reason for choosing pipeline architecture:
The single processing model is well used in the CEP domain, where some famous CEP
engines such as Esper are using such architectures. But this solution is highly discouraged as
it uses only one processing thread. Since CEP is suppose to produce high performance, and
therefore having multi-core processors running a single threaded process is highly inefficient.
When compared to single processor model, parallel processing seems to be more attractive.
Here each complex query will be processed by a thread. But the disadvantage of this process
is its’ resource utilization. This is because in most cases many complex queries have several
common sub queries, and this will end up by running many duplicate sub queries at the same
time.
Hence the pipeline architecture model rectifies the issues occurred in the both the single
processor model and parallel processing model, by having only one running sub query at a
time. Here each sub query will be running in a separate thread and further paralyzing the
execution. This facilitates much faster processing and high throughput as we required.
3.3.3. State Machine
State machine is one of the major components in Siddhi. It can be considered as the most
vital part in processing complex queries.
In Siddhi we have used state machine for handling two types of queries. They are Sequence
queries and pattern queries.
Sequence Queries
In sequence queries we can define Siddhi to fire an event when series of conditions are
satisfied one after the other.
Example
We define the sequence of conditions as,
A -> B -> C
Let’s consider event sequence as follows,
A1, B1, A2, A3, C1, A4, B2, B3, C2
An event is fired when Siddhi receive C1. Siddhi captures the event sequence A1, B1, C1.
Every Operator
In the above mentioned sequence after capturing an event successfully, Siddhi stops looking
for new events. If we want to continue the process, we can use the ‘Every’ operator.
Consider the event sequence
A1 B1 C1 B2 A2 D1 A3 B3 E1 A4 F1 E2 B4
Table 2 Different Every Operator Cases
Example
Description
every ( A -> B ) Detect an A event followed by a B
Matched Sequences
Matches on B1 for combination {A1, B1}
event. At the time when B occurred,
Matches on B3 for combination {A2, B3}
the sequence matcher restarts and
Matches on B4 for combination {A4, B4}
looks for the next A event.
every A -> B
An event fires for every A event
Matches on B1 for combination {A1, B1}
followed by a B event.
Matches on B3 for combination {A2, B3}
and {A3, B3}
Matches on B4 for combination {A4, B4}
A -> every B
An event fires for an A event followed
Matches on B1 for combination {A1, B1}.
by every B event.
Matches on B2 for combination {A1, B2}.
Matches on B3 for combination {A1, B3}
Matches on B4 for combination {A1, B4}
The Sequence processor does the core processing of sequence queries.
Design Decisions in Sequence Processor
In Siddhi the basic unit which does the condition matching is the Executor. For the Sequence
Conditions we have series of ‘Followed-by Executors’ corresponds to each state.
Here we have deviated from the conventional state machine concept. In a conventional state
machine only one state is active and listening to inputs at a particular time. But because of the
‘Every operator’ here we have multiple states listening to input events.
A Map to hold Executor Listeners
We have used a Map to store the currently active Followed-By Executors.
Map <StreamId, LinkedList<FollowedByExecutor>>
The linked List holds the Followed-By Executors which belongs to a particular stream.
When sending the events, we send only to executors which correspond to the input event
stream id.
Design Alternatives:

A Linked list to hold all the executors
We could have stored all the executors in a one Linked List and send new input events for all
of them iteratively.
Event Flow
Event
CSE Stream
Data 1
Data 2
Steam Id
Map
CSE Stream
Electricity Data
ream
Wind Data
Temp Data
Linked List
Matched/
Mismatched
Figure 18 Map holding Executor Listeners
The reason for choosing a Map
In the initial design of the Sequence Processor we have used only a Linked List to hold all the
currently active executors. When we were doing performance improvements we saw that
sending all the input events to all the active executors reduces the performance.
So we decided to use a Map to prevent sending all the input events to all the active executors.
Here we have filtered the necessary executors using the key of the Map.
Pattern Queries
In Pattern Queries we can define Siddhi to fire an event when series of conditions satisfied
one after the other in consecutive manner.
Example:
We define the sequence of conditions as,
A -> B -> C
Let’s consider event sequence as follows,
B1, A1, B2, C1, A2, B3, B4, C2
An event is fired when Siddhi receive C1. Siddhi captures the event sequence A1, B2, C1.
Unlike in the Sequence Queries here the sequence of conditions has to be satisfied
consecutively.
In the above example no event is fired for the event sequence,
B1, A1, A2, C1, A3, B3, B4, C2
Here Pattern matching started when Siddhi receives the A1 event. Then it will listen to a B
event. Since Siddhi receives A2 event, the pattern fails.
Kleene Star Operator
For the pattern queries we can use kleene star operator to define infinite number of
intermediate conditions.
Example:
We define the sequence of conditions as,
A -> B* -> C
In here B* stands for zero or more B events
Let’s consider event sequence as follows,
A1, B1, B2, C1, A2, B3, B4, C2
An event is fired when Siddhi receive C1. Siddhi captures the event sequence A1, B1, B2, C1
and A2, B3, B4, C4
Design Decisions in Pattern Processor
List to hold Executor Listeners
For the Pattern Conditions we have series of Pattern Executors corresponds to each state. We
have used a Linked List to store currently active Executors.
Design Alternative:
As discussed in the Sequence queries we can use a Map to hold currently active event
listeners.
The reason for choosing a linked list:
Compared to the Sequence Processor, the number of active executors in the Pattern processor
is very low. So there is no need to use a Map to filter out executors. This makes the
implementation simple and prevents unnecessary overheads in finding relevant executors.
3.3.4. Processor Architecture
Siddhi processor has two major components,


Executors
Event Generators
Executors
Executors are the principle processing element in the Siddhi processor. These Executors are
generated by the Query parser by parsing the Query Object Model defined by the user. The
Executors formed will have a tree like structure, and when an Event is passed to the Executor
tree, it will process and return true if the Event matches with the query or return false if the
event does not match. Though at the same time there can be many Executor trees present in
the processor, only one get processed at a particular moment.
These Executor trees are processed in a depth first search (DFS) order, and when a miss
match occurred at any of the nodes, the process gets terminated and the nodes will recursively
returns false till the root to notifying false to the processor and making that event an obsolete.
If all the nodes had matching outputs the root will notifying true to the processor.

AndExecutor,

ExpressionExecutor

FollowedByExecutor

NotExecutor

OrExecutor

PatternExecutor
Design Alternatives:

Multiple Executor model
Reason for choosing Tree executor model:
Since each sub query is having its own query condition, they have to have corresponding
executors for each condition. Therefore it is essential to have many different types of
Executors to handle all the different cases. Though the ‘multiple executor model’ satisfies
this requirement, it has issues in rejecting non matching Events at the early stage. This is
because it needs to process duplicated Execution nodes in different sequences consuming
much time and delays in finding out the failing nodes. But in the use of Tree executor model,
the Executors are arranged in an optimal order and in this arrangement if the left sub tree
returns false, the right sub tree won’t be processed and the false out put is immediately
notified to the processor.
3.3.4.1. Event Generators
Event generator is the other most important element in the processor. The duty of the Event
generator is to produce output Event tuples according to the query definition and place them
in the output queue.
Design Alternatives:

Bundled output Events
The reason for selecting this Formatting output Events:
Some well known CEP solution sends all accepted events bundled together when the query
get satisfied. These bundling of Events is occurred when executing queries such as Pattern
Query, Sequence Query and Join Query, this is because when the query gets satisfied many
output events need to be sent at the same time.
In the first iteration of Siddhi we came up with this model but then we failed miserably, this
is because of its lack of support in query chaining. Our architecture became more and more
complex when we need to plug in queries for the bundled output Events.
Therefore we decided to follow the SQL convention with the advice of Dr. Srinath by
formatting output. Here the output will be converted in to a single output Event as defined in
the definition. This enables better control over the other queries and enables Siddhi to have
pluggable pipeline architecture.
3.3.5. Query Object Model
Query object model depicts the internal structure of the query, which the Siddhi-core can
understand. Since we have not yet implemented the compiler, currently the Query object
model is used to configure Siddhi queries. Thought this is not user friendly like SQL, since
Siddhi is following an SQL like query structure the Query object module too resembles the
same, and therefore its very easy for users to understand and write queries in Siddhi.
Figure 19 Simple Siddhi Query
As Siddhi query is currently an Object Model that has to be coded as follow or
programmatically generated. The ability to generate the query pragmatically allows Siddhi to
have various different interfaces suitable for its deployment. For example the ‘OpenMRS’
implementation uses a ‘Graphical Form Entry’ to define the queries [46].
Figure 20 Siddhi Query Form UI Implementation Used in OpenMRS
Design Alternatives:

Custom object model
Reasons for selecting SQL like object model:
Using Custom Object model is feasible, but we decided to chose an SQL like object model
because of the following advantages.
1. By following SQL standards, Siddhi Query Language fall in-line with the relational
algebraic expressions and the Siddhi Query can also utilize the Optimization techniques that
are used in SQL and relational databases.
2. Other CEP solutions also follow the same trend, and it is widely acceptable, as CEP and
SQL queries mostly express the same set of functions
3. SQL is common and it is easy for users to start using Siddhi, if Siddhi query is also in-line
with SQL.
3.3.6. Query parser
Query parser parses the Query object model and creates the query Executors. After the query
(the Query Object Model) is added to the Siddhi Manager, and when the Siddhi Manager is
updated, the Siddhi Manger parses the Query object models, then Executors are created from
the object model. And finally created executors are assigned to a processor.
Currently though the Executor tree structure is quite similar to the defined conditions in the
query, various optimization techniques have been implemented for faster performance and
user friendliness.
Design Alternatives:

Generating the Executions without having an internal Query object model
The reason for generating the Query object model and then convert that to Executors:
When designing, we looked in to both the alternatives. Generating Executors without Query
object model seems to be very fast and straight forward. But this model in not capable
enough to handle complex queries as its fails to optimize the Executor by reusing the
available resources. But when we create the Query object model first and then create the
Executors, Siddhi was able to get more understanding on the nature of each Query and tune
the Executors accordingly.
Using this architecture Siddhi was able to achieve performance improvements as this
facilitated Siddhi to use SQL Query optimization techniques and resource sharing to build the
optimal pipeline structure. For example, managing the Executors according to the complexity
of the query is one of its major features in query optimization. Here when the queries have a
collection of simple conditions Siddhi uses the Simple Executors. These are fast but limited
in capability. When more complex conditions are defined the Query passer tries to convert
that to simple queries and use the Simple Executors. Only failing this attempt Siddhi will use
the Complex Executors which will have MVEL engines that can perform highly complex
operations, but with the expense of time.
Figure 21 Siddhi Query with Simple Condition
3.3.7. Window
Time Window
Time window is a sliding window which keeps track of the events arrived within a given
amount of time to past from the current time. Time windowing concept (as well as length
window) is useful for analyzing events that arrived within a limited amount of time. This is
useful in many cases. For example, this is used for getting statistical analysis of the arrived
events such as average, sum of a particular attribute in the arriving events etc.
Figure 22 Siddhi Time Window
Take the following query as an example (The internal Query model is shown). This query
filters out the buying and selling events of the stock symbol “IBM” via the input stream
‘cseEventStream’. Then, it calculates the average of its’ stock price within an hour. Then it
generates an output an event containing the attributes (symbol, avgPrice).
Figure 23 Siddhi Time Window Query
Batch window
Siddhi has built-in data windows for both Time and Length windows that act on batches of
events arriving via event streams. Thus the two batch windows which Siddhi supports are
named as Time Batch Window and Length Batch Window. In other words, Batch window
concept is exactly same as the typical Siddhi sliding window concept except it sends events
as a batch at the end of the time or length interval instead of sending each at the arrival time.
Time Batch Window:
Collects all the events within a given time interval as defined in the time window and send
them all at once to the Siddhi core listeners to process as a batch of events
Length Batch Window:
Collects all the given no of events as defined in the length window and send them all at once
as a batch of events to the Siddhi core listeners to process.
Following diagram will illustrate how Time Batch Window works for a window size of 4
second and event receiving start time of t.
Figure 24 Siddhi Batch Window
1. At time t + 1 seconds an event E1 arrives and enters the batch window. Siddhi Event
Listener will not be informed.
2. At time t + 3 seconds, event E2 arrives and enters the batch window. Still Siddhi Event
Listener will not be informed.
3. At time t + 4 seconds Siddhi Event Listener captures all the events in the batch
window(E1, E2) and handover events as a batch at once, to Siddhi core to consume and a
starts a new batch to wait for new events.
4. At time t + 5 seconds an event E3 arrives and enters the batch window. Still Siddhi Event
Listener will not be informed.
5. At time t + 8 seconds, Siddhi Event Listener captures all the events in the batch
window(only E3) and handover events as a batch at once, to Siddhi core to consume and a
starts a new batch to wait for new events. Also set E1 and E2 event’s “isNew” Boolean flag
to false.
These steps will continuously take place throughout the event processing period.
3.3.8. “UNIQUE” Support
The unique view is used for filtering duplicated events. Event duplication is an important
concept in Complex Event Processing, especially when multiple event streams are involved.
Say, for example, there were several RFID sensors in a super market to get the readings of
the products which covers different sections. Obviously there will be over-lapping between
the sensors, which will be producing the same event. It may lead to inaccurate and incorrect
output. So, this needs to be handled. The UNIQUE functions provide support for that.
The duplication removal mechanism is customizable, and the user can chose what event
needs to be filtered while what needs to be kept from a set of duplicate events. There are two
main kinds of methodologies for matching control. Those are,

UNIQUE
o The unique view is a view that includes only the most recent among events having
the same value(s) for the result of the specified expression or list of expressions.

FIRSTUNIQUE
o The FIRSTUNIQUE view is a view that only includes the first event out of a set
of duplicated events. The event duplication is determined by evaluating the given
expression. If this view is set, whenever a duplicated event arrived, it will be
dropped.
What criteria need to be considered to evaluate the event duplication is also customizable,
and can be specified by the user. In the query, user could specify any parameters/fields that
need to be matched. UNIQUE support is available for both time-windowed queries, and
queries without a time-window.
Figure 25 Siddhi Unique Query
4. Implementation
This section covers the implementation details of Siddhi. First we describe the development
process models we used. And, then we move on to saying how we manage the source, and
how we build the project etc.
4.1. Process models
We have used two process models for Siddhi development. Namely, they are Scrum, and
Test-Driven Development (TDD). This section will describe what these process models are,
and the reasons we chose them as our process models.
4.1.1. Scrum Development Process
Scrum is a software development process which is a form of agile software development. It is
an iterative and incremental development approach which progresses by several iterations
which are called sprints. A sprint is generally two-four weeks long, and the team sets goals
and objectives be completed during the sprint at the start of each sprint. Then, the scrum team
has daily meetings (generally starts at same time and same place everyday) for getting an
update about the project status. The daily scrums are generally set for 15 minutes.
Figure 26 Scrum Development Process
Generally there are set of defined features that clients require from the product which the
team is developing. These set of features are called product backlog. For each sprint, the team
takes a set of features out of product backlog and put them in to the Sprint Backlog which
they’ll be implementing during the sprint. The daily meetings are taken place to get an update
of each feature the team members are developing. “Sprint Burn down” chart is used to track
this progress. It shows the remaining work, the unfinished work and how far the work is
completed. At the end of the sprint, all tasks are revised; the unfinished features are put back
in to the product backlog; and the client is updated about the new features. Generally a
product release happens if the product is in a releasable state.
We thought to use Scrum for development, because it seems that it’s more appropriate for our
work. Especially since we are writing the product from scratch, and we needed iterative
development process. Further, scrum was more suited because we had a large knowledge
base to cover and the communication between team members was very important for the
successful completion of the project. In addition to those there are several other advantages of
Scrum, including,

Increase in productivity

High communication between team members makes sure that everyone understand
the project requirements well

Frequent releases of the product. That way, users don’t have to wait longer time to
benefit from the newly implemented features. “Release early, Release often” theme is
getting popular among Open-Sources project these days.
4.1.2. Test-driven development (TDD)
While we use Scrum, we used Test-driven development as our development methodology
within Scrum sprints. In test-driven development, the developer first writes a failing test case
for a desired feature / task. Then the developer writes code to successfully pass the tests, and
then matches the developed code according to the coding standards & guidelines that the
team agreed upon.
Figure 27 Test-Driven Development (TDD)
The above flow chart shows how this process progresses.
There are several benefits on this process. Since the needed tests are already created there’s
less need of debugging the code. Because of the nature of this process, there will be test cases
for testing every small feature. That means the code-coverage by tests will be higher which
means it’s less prone to errors.
Since there were several features / functions are needed from Siddhi, there was a great deal of
need for testing the code. For a code written from scratch, the base/underline-framework
needs to be tested thoroughly to make sure that code is accurate and does the intended job.
It’ll be pretty hard if broken features were identified later on, because probably there will be
lot of developments carried out on top of that already. It may mean that all the relevant
section need to be rewritten. Test-driven development avoids these issues to a greater extent.
4.2. Version control
For version controlling we have used subversion (SVN) version control system.
The main reason for selecting subversion as our version control system is that we all are very
much familiar with it. And it has all the features that we are expecting.
We have hosted our source code on sourceforge.net which is one of the most popular sites for
hosting open source projects.
Siddhi source can be found in,
http://siddhi.sourceforge.net/source-repository.html
All of us commit our regular work to the trunk. When we have a stable milestone
implementation in the trunk, we create a tag out of it.
4.3. Project management
We have used Apache Maven as a project management tool. Maven is used to make the build
process easier.
Siddhi project structured in to following main modules.
1. siddhi-api
2. siddhi-core
3. siddhi-io
4. siddhi-benchmark
siddhi-api:
siddhi-api contains the classes related to creating queries. We decided to have a separate
module for the API in order to make a clear separation from the core. Siddhi-api does not
depend on the core of Siddhi.
siddhi –core:
Siddhi core contains class related to the core processing. This can be considered as the heart
of the Siddhi. Siddhi-core is dependant on the siddhi-api module.
siddhi-io:
Siddhi-io contains the input output related classes.
siddhi-benchmark:
Siddhi-benchmark contains classes related to the siddhi-benchmark. Siddhi-benchmark
depends on the siddhi-core module.
In Siddhi we have used maven to following tasks.

To build binary from the source code

To create javadoc documentation

TP create the website
4.4. Coding Standards & Best Practices Guidelines for Siddhi
Siddhi has employed a set of coding standards, and best practices to maintain the consistency
and increase the readability of the Siddhi code. The guidelines are as following.
4.4.1. General
Comments

Doc comments
All classes and all methods/functions MUST have doc comments. The comments
should explain each parameter, return type and assumptions made.

Line comments
In case you have complex logic, explain any genius logic, rationale for doing
something.
Logging
Log then and there. Have ample local information and context for the logs. Remember that
logs are for users. Make them meaningful, readable and also make sure you spell check
(ispell).
Use correct log level, ex. do not log errors as warnings or vice-versa. Remember to log the
error before throwing an exception. We use commons-logging 1.1 for logging purposes.
Logic
Make sure that your genius code readable. Always use meaningful variable names.
Remember, compilers can handle long variable names. Variables declared in local, as and
when required. The underscore character should be used only when declaring constants, and
should not be used anywhere else in Java code
Methods/Functions
Make sure the function/method names are self descriptive. One should be able to explain a
function/method using a single sentence
without conjunctions (that is no and/or in
description)
Have proper separation of concerns. Check if you do multiple things in a function. Too many
parameters are smelly, indicates that something is wrong. Use status variables to capture
status and return at the end whenever possible. Avoid returning from multiple places, that
makes code less readable.
Committing to repository
Use your own account for committing. Don’t use the primary account.
Use separate commits for each different changes you make. For example: If you are going to
fix two bugs, first fix one bug and commit it; then fix the other bug and commit it.
Following provides a set of additional guidelines.

Be consistent in managing state e.g. Initialize to FALSE and set to TRUE everywhere
else

Where does that if block end, or what block did you end right now? Have a comment
at end of a block at }

Use if statements rationally, ensure the behavior is homogeneous

In case of returning a collection, must return an empty collection and not null (or
NULL)

Do not use interfaces to declare constants. Use a final class with public static final
attributes and a private constructor.

Always use braces to surround code blocks ({}) even if it is a single line.

Be sure to define, who should catch an exception when throwing one

Be sure to catch those exceptions that you can handle

Do not use string literals in the code, instead declare constants and use them, constant
names should be self descriptive

Use constants already defined whenever possible, check to see if someone already
declared one.
4.4.2. Java Specific
Coding conventions - http://java.sun.com/docs/codeconv/
Only exception is line length, say 100
Run FindBugs on your code - http://findbugs.sourceforge.net/
4.5. Profiling
Profiling is a dynamic program analysis methodology. It is used for finding memory
leakages, usage of functions and methods in the code, frequency of method calls etc.
Profiling is a very important phase of the software development, especially when the product
is performance critical.
Profilers that are used for profiling shows the profiled info using charts, and tables detailing
each and every aspect of the product. Since Siddhi edge lies in performance to a greater
extent, profiling played a key part. We have used JProfiler, a commercial all-in-one Java
profiler. Our main need is to reducing the time spent on the CPU, so, the hot-spot methods,
and method call stack traces were mostly needed. Hot-spot methods are the ones which are
taking considerable percentage of the CPU time. Most of these methods implementations can
be improved by having a careful look at the code. Reducing the method invocation is one
other aspect. Most of the time, there are unnecessary method calls that exist in the code in
which the functionalities can be achieved by fewer method calls. Especially Siddhi is targeted
to process as big as 30,000 events per second. So, there is enormous number of method calls
for processing one event. So, even one change in a method may make a big increase in the
performance.
We’ve ran profiling after each iteration of our development as done the necessary changes.
Our supervisor, Dr. Srinath Perera also helped on this.
Figure 28 Call Tree for the Time Window in JProfiler
The Call Tree view shows the stack trace on where the program started, and which methods it
has called. The method calls are shown in a tree structure along with their execution times,
and number of invocations. With this view can identify where the Siddhi spends most of it’s
time when executing. Obviously, the time spent on utility classes/methods should be lower
while the core methods of Siddhi taking most of the time. This view can be used to identify
things like that. After identifying the flaws in the code, we improved the code, and did a
profile again to measure whether there’s a change. This will be repeated.
Further, there’s a Hot-Spots tab under the same “CPU Views” section, which shows the hotspot regions in the code that consume most of the CPU time. It generally shows the time and
percentage CPU time of methods. These methods need special attention.
Following diagram shows the memory view of Siddhi for the time window query. This view
along with heap walker snapshot is useful to determine whether there are any memory leaks
in the code. A quick analyze of this graph shows that EventImpl class objects has taken about
49 MB of memory, while the LinkedBlockingQueue objects taking about 24MB of memory.
These two together has taken 60%+ of the total memory.
Figure 29 The Memory Usage of Siddhi for a Time Window query
4.6. Benchmark
Siddhi released a benchmark for performance evaluation. This benchmark can be directly
used to evaluate the performance of Siddhi in different view points as well as it can be used
to compare performance between other competitive CEPs. Siddhi benchmark kit basically
listens to Events from remote clients over TCP and process them in the Server. The
benchmark evaluates the performance of VWAP(Volume Weighted Average Price) Events.
Many other CEP benchmarks also use VWAP Events for their benchmarks.
Figure 30 Siddhi Benchmark

This is capable of handling events from multiple clients connected to it.

The benchmark is capable of handling given number of queries to process on Siddhi
core.
This
can
be
done
by
adding
a
system
parameter
-
Dsiddhi.benchmark.symbol=1000 and it will generate 1000 different queries
from the given query by only changing the symbol of the query.

Benchmark has shell scripts for client and server to start the kit. And all the above
mentioned options can be set as configured by changing the parameters in the shell
script.

The server output is logged in to serverout.log and all the results can be found there.

This KIT is capable of interpreting CEP performance results under event basis or time
basis by setting the parameter BMEVAL to true or false. And we encourage using
Event Based evaluation method which is more expressive than the other.

All the time durations are calculated with the network overhead. Which means the
time is calculated as end to end.
In the server script
There are options to edit or you can let the default values use by specifying the attributes.
# Uncomment following and set JVM options
#JVM_OPT="-Xms1024m -Xmx1024m"
# Set a port to allow clients to connect to this server or use
default port without specifying any
PORT="-port 5555"
#Set maxim limit of events for the CEP to process
LIMIT="-limit 100"
#Set variable true for Event based benchmark evaluation or set
false to evaluate in time basis procedure and results will be
viewed based on this
BMEVAL="-eventbs false"
In the client
# Uncomment following and set JVM options
#JVM_OPT="-Xms1024m -Xmx1024m"
# Set host server address or use default host without
specifying any
HOST="-host 127.0.0.1"
# Set batchwait time in milliseconds which will sleep a given
waited time among event batches or comment the param to skip
waiting.
BATCHWT="-batch_wait 10"
# Set port to connect to the server or use default port
without specifying any
PORT="-port 5555"
The Siddhi Benchmark basically can be used to understand and analyze Siddhi performance
against different queries on given number of events and even can be used to compare with
other CEP implementations.
4.7. Documentation
We have fully documented Siddhi for the use of users and future developers. We’ve created
user guide and developer guides for facilitating that. DocBook was used as our markup
language. DocBook is a semantic markup language for technical documentation. It is highly
popular among open-source communities as their default documentation format.
Compared other formats like doc, odf etc, DocBook can be used to create document content
in a presentation-neutral form. Further, it allows localizing the documentation in a pretty
easier manner. The presentation neutral view comes from the native nature of it. DocBook
document is written in XML using the dtd rules DocBook provides. It does not contain any
styling information. We can then add any styling to it using a XSL style sheet. XSL (XML
Styling Language) could be written to transform the said DocBook document to several
forms including PDF, HTML web pages, MS-word doc, EPUB, webhelp format etc.
Because of these powerful features of it as well our familiarity with the product made us to
use DocBook as our documentation format. We generate the documentation in PDF, and
DocBook WebHelp (for web publishing) formats.
The documentation is generated by Apache Maven using docbkx-maven-plugin. Following
code snippet generates this.
Figure 31 DocBook Documentation Configuration
4.8. Web Site
We have developed website for our project detailing Siddhi features, how to access our
source repository, hosting the documentation, project deliverables (releases) etc. The web site
is hosted at Sourceforge and can be accessed via http://siddhi.sourceforge.net/. [47]
Figure 32 Siddhi Web Site
It provides access to following information.

Downloads - Releases and Source code

Documentation - User guide, Developer guide, Javadocs, FAQ, and a Source code
Viewer (Viewvc)

Project Information - Mailing lists, Project team, Bug Tracking System

The license of Siddhi - The site provides a link to a copy of Siddhi’s license, Apache
License, version 2.0
4.9. Bug tracker
Issue tracking system is a system which keeps track of issue of a particular software system.
In open source world JIRA is a well known issue tracker. But Siddhi is currently hosted in the
SourceForge source repository. Thus we didn’t go for a separate issue tracking system, and
we used the SourceForge native issue tracker.
There, people who are signed up for the project can create issue regarding Siddhi, and they
can set different important attributes of that particular issue such as its description, priority,
resolution, status, assignee and etc. We have raised several main issue on this an assigned the
related person so that he will receive a notification that he has assigned a work based on some
issue.
This led our group to have a better remote communication in addition to siddhi-dev mailing
list discussions. And when the issue is resolved, the reporter or admin can close the issue. As
we already have two clients (OpenMRS and Hybrid California Weather forecasting project),
they also can report bugs in our issue tracker (which is very straight forward and useful) and
wait to resolve them by Siddhi developers. In this issue tracking system, we have a good
opportunity to identify bugs which we may not find, but users could find.
4.10. Distribution
Siddhi jars are hosted in the Maven repository. This was powered by the sonatype.org.
Through this any Maven project can directly use Siddhi by simply adding the dependency
configurations and the repository configuration in their pom.xml.
<dependencies>
...
<dependency>
<groupId>org.siddhi</groupId>
<artifactId>siddhi-api</artifactId>
<type>jar</type>
</dependency>
<dependency>
<groupId>org.siddhi</groupId>
<artifactId>siddhi-core</artifactId>
<type>jar</type>
</dependency>
</dependencies>
<repositories>
<repository>
<id>sonatype-nexus-snapshots</id>
<url>https://oss.sonatype.org/content/repositories/snapshots</url>
</repository>
</repositories>
5. Results
The results of the performance testing for given set of queries are discussed below.
5.1. Performance testing
The most expected outcome of a Complex Event Processing engine is its performance. How
fast it can process and evaluate, events and patterns, and notify them to the subscribers is a
very important fact. End to end delivery time is a key fact which CEP clients mostly expect.
Thus, it led Siddhi to have a benchmark performance kit for evaluating performance. Other
than evaluating Siddhi performance along itself, we as Siddhi team, decided to do a
performance comparison with an existing competitive CEP engine in today’s CEP market. So
as an initiative, we decided to do a performance comparison with Esper (Esper is one of the
most widely used CEP engine in the current market which has many customers including
more than 30-40 giant customers like ORACLE, Swisscom and etc.)
Due to incompatibilities and the way Esper benchmark interprets their performance results, to
compare Esper performance with Siddhi; we couldn’t directly use the Esper performance
benchmark kit to do a perfect performance comparison by providing 100% similar conditions
to both parties. Thus we had to implement a separate framework which provides exactly same
conditions to both Esper and Siddhi. And initially we compared the performance with three
different types of queries which covers most of the basic important CEP key functionalities
such as pattern matching, simple filtering and filtering with data windows.
So following three graphs will illustrate the performance comparison of Siddhi and Esper for
three basic queries.
NOTE: These three queries are very similar to Esper performance benchmark queries.
1. Performance comparison for a simple filter without time or length window.
Graph 1 Siddhi Vs Esper Simple Filter Comparison
2. Performance comparison for a timed window query for average calculation for a given
symbol.
Graph 2 Siddhi Vs Esper Average over Time Window Comparison
3. Performance comparison for state machine.
Graph 3 Siddhi Vs Esper State Machine Comparison
6. Discussion and Conclusion
6.1. Known issues
Event order might change when parsing the events from one processor to another
within the Siddhi core
The reason for this is, when parallel processed events are combined, the event arrival at the
combining point may depend not only on the sequence of input event arrival but also on the
execution speed of the previous processes.
6.2. Future work
In future we are expecting to address the above mentioned issues.
Apart from the above mentioned issues following features will also add a value to Siddhi.
6.2.1. Incubating Siddhi at Apache
Siddhi team has an idea of incubating Siddhi project at the Apache Software Foundation
(ASF) which is a well known open source software foundation.
It will help Siddhi to get a good recognition in the open source world. And many people will
get a chance to use and contribute for the future development of the project.
6.2.2. Find out a query language for Siddhi
We need to find out a query language that will be sufficient to express the full set of pattern
queries. Currently Siddhi uses an object model to represent a query. We are expecting to
implement a query language that will provide a simpler method to write queries.
6.2.3. Out of order event handling
Due various reasons like network issues, events may come into the CEP engine in out of
order. That means an event which has a recent timestamp may come before an event which
has an older timestamp. Currently Siddhi does not support out of order event handling.
6.3. Siddhi Success Story
Implementing a high performance complex event processing engine (CEP) is a challenging
project. This idea was initiated by Dr. Sanjiva Weerawarana and Dr. Srinath Perera where
they had a vision for a completely Open-Source high performing CEP. We had a temptation
to take this challenge, while past students for a couple of years have dropped this idea feeling
this is a too risky for a final year project.
Our external supervisor is Dr. Srinath Perera and our internal supervisor is Mrs. Vishaka
Nanayakkara. After having couple of meetings with them we put up a list of tasks to
accomplish before starting our projects. As the first initiative, we did a lot of research by
reading several papers related to complex event processing. Then with some understanding
we started our first iteration, trying to implement a working CEP. But we failed our first
attempt (yes, it was a miserable failure). But then, after several rounds of discussions and
meetings, we took some important architectural decisions based on our failures. We knew we
would have to throw away more codes in future, until we find the best one. Therefore even in
some vacation days as a project group, we decided to come to the university and work on our
project.
The most important decision we took intern of implementation is that we are never going to
look in to the codes of other CEP implementation; this is because we always wanted to come
up with our own idea to reach our goal. So during the implementation process, we faced
difficulties but with couple of more iteration we improved the quality of our implementation
step by step by doing performance testing and profiling on our code. Meantime we looked in
to other CEP functionalities and started implementing the lacking functionalities in ours.
With the help of Dr. Srinath, we got a client for a Project from US who are in need of a
Complex Event Processing engine. They had some requirements they expected from a CEP.
We have collected there requirements, and has implemented some of those features in Siddhi
as appropriate. At the same time, we also got a contact from OpenMRS, which is the one of
the most popular medical record system in the world having more than several millions of
patient records. As Siddhi was in a public maven repository at that time, they tried out our
CEP and got impressed. Thus, now Siddhi is running in the back end of OpenMRS NCD
Module.
We got a great support from our supervisors. Dr. Srinath helped us a lot in designing Siddhi
and his contribution helped a lot in making this project a success. Ms. Vishaka Nanayakkara
always guided us by giving valuable advices to stay on track with the project and on our
research. Therefore we would like to thank both our supervisors for the important role they
have played to bring our project to this state.
Our next goal is to push this project to the Apache Software Foundation and release this
under Apache license v2, so that we can dedicate a high performance CEP engine to the open
source community.
6.4. Conclusion
As per the above discussion, Siddhi can be used for complex event processing delivering the
performance that most of the people expect. Siddhi has addressed the absolute need of having
have an open-source CEP with the ability of processing huge flood of events that may go well
over one hundred thousand events per second with a near-zero latency.
We have carried out a detailed literature survey, comparing and contrasting different Event
Processing architectures, and has came up with an architecture, that seems to been
computationally efficient. It has been optimized for high speed processing with low memory
consumption.
Currently, Siddhi has the entire common features that a Complex Event Processing engine
should support that is based on the basic framework for Complex Event we’ve written.
Further, there are some additional features in Siddhi that are added based on the requests
from users. Current Siddhi implementation provides an extendable, scalable framework for
the open-source community for extending Siddhi to match specific business needs.
Abbreviations
CEP – Complex Event Processing
ESP – Event Stream Processing
EDA – Event-Driven Architecture
EPL – Event Programming Language
RDBMS – Relational Database Systems
XML – eXtensible Markup Language
SOAP – Simple Object Access Protocol
ESB – Enterprise Service Bus
BAM – Business Activity Monitor
SOA – Service Oriented Architecture
URL – Uniform Resource Locator
ASF – Apache Software Foundation
BPM – Business Process Management
Bibliography
[1] D. Luckham and R. Schulte. (2007, May) Event Processing Glossary. [Online].
http://complexevents.com/?p=195
[2] T. J. Owens, "Survey of event processing. Technical report, Air Force Research
Laboratory, Information Directorate ," 2007.
[3] (2010) S4: Distributed Stream Computing Platform. [Online]. http://s4.io
[4] (2010, Nov.) S4. [Online]. http://wiki.s4.io/Manual/S4Overview
[5] and H. Baker C. Hewitt, "ActorsAndContinuousFunctionals ,".
[6] G. Tóth, R. Rácz, J. Pánczél, T. Gergely, A. Beszédes, L. Farkas, L.J. F\ül\öp, "Survey
on Complex Event Processing and Predictive Analytics," , 2010.
[7] (2010, Nov.) Homepage of Esper/NEsper. [Online]. http://www.espertech.com/
[8] Homepage of PADRES. [Online]. http://research.msrg.utoronto.ca/Padres/
[9] Alex Cheung, Guoli Li, Balasubramaneyam Maniymaran, Vinod Muthusamy, Reza
Sherafat Kazemzadeh Hans-Arno Jacobsen, "System, The PADRES
Publish/Subscribe,".
[10] Homepage of Intelligent Event Processor (IEP). [Online]. http://wiki.openesb.java.net/Wiki.jsp?page=IEPSE
[11] (2010, May) Sopera Homepage. [Online]. http://www.sopera.de/en/home
[12] Stream-based And Shared Event Processing (SASE) Home page. [Online].
http://sase.cs.umass.edu/
[13] Y. Diao, and S. Rizvi E. Wu, "High-performance complex event processing over
streams," , Chicago, IL, USA, 2006.
[14] Cayuga Homepage. [Online]. http://www.cs.cornell.edu/database/cayuga/.
[15] Air Force Research Laboratory, "Survey of Event Processing ," , 2007.
[16] J. Nagy, "User-Centric Personalized Extensibility for Data-Driven Web Applications ,
IF/AFOSR Minigrant Proposal," , 2007.
[17] J. Gehrke, B. Panda, M. Riedewald, V. Sharma, and W. White A. Demers, "Cayuga: A
General Purpose Event Monitoring System," , Asolimar, California, January 2007.
[18] J. Gehrke, M. Hong, M. Riedewald, and W. White A. Demers, "A General Algebra and
Implementation for Monitoring Event Streams," 2005.
[19] K. Vikram, "FingerLakes: A Distributed Event Stream Monitoring System,".
[20] Aurora Homepage. [Online]. http://www.cs.brown.edu/research/aurora/
[21] Borealis Homepage.
[22] D Abadi, D Carney, U. Cetintemel, and M. et al. Cherniack, "Aurora: a data stream
management system," , San Diego, California, 2003.
[23] H. Balakrishnan, M. Balazinska, D. Carney, and U. et al. Cetintemel, "Retrospective on
Aurora," vol. 13, no. 4, 2004.
[24] D. Abadi, Y. Ahmad, and M. Balazinska: U. Cetintemel et al. al., "The Design of the
Borealis Stream Processing Engine," , 2005.
[25] S. Zdonik, M. Stonebraker, M. Cherniack, and U. Centintemel et al., "The Aurora and
Medusa Projects," , 2003.
[26] TelegraphCQ Homepage. [Online]. http://telegraph.cs.berkeley.edu/
[27] S. Chandrasekaran, O. Cooper, A. Deshpande, and M. Franklin et al., "TelegraphCQ:
Continuous Dataflow Processing for an Uncertain World," , 2003.
[28] S. Chandrasekaran, O. Cooper, A. Deshpande, M. Franklin, J. Hellerstein, W. Hong, S.
Madden, F. Reiss, M. Shah, S. Krishnamurthy, "TelegraphCQ: An Architectural Status
Report," IEEE Data Engineering Bulletin, vol. 26, no. 1, 2003.
[29] T. Sellis, "Multiple Query Optimization," , 1988.
[30] N., Sanghai, S., Roy, P., Sudarshan, S. Dalvi, "Pipelining in Multi-Query
Optimization," , 2001.
[31] A., Sudarshan, S., Viswanathan, S. Gupta, "Query Scheduling in Multi Query
Optimization," , 2001.
[32] P., Seshadri, A., Sudarshan, A., Bhobhe, S. Roy, "Efficient and Extensible Algorithms
For Multi Query Optimization," , 2000.
[33] STREAM Homepage. [Online]. http://www-db.stanford.edu/stream/
[34] PIPES Homepage. [Online]. http://dbs.mathematik.unimarburg.de/Home/Research/Projects/PIPES/
[35] Sybase Complex Event Processing. [Online].
http://www.coral8.com/developers/documentation.html
[36] Sybase Complex Event Processing. [Online].
http://www.coral8.com/developers/documentation.html
[37] Coral8: The Fastest Path to Complex Event Processing. [Online].
http://www.coral8.com/developers/documentation.html
[38] Progress Apama – Monitor, Analyze, and Act on Events in Under a Millisecond..
[Online]. http://www.progress.com/apama/index.ssp
[39] S. Zdonik. (2006, March) Stream Processing Overview [presentation] Workshop on
Event Processing, Hawthorne, New York.
[40] Real-Time Data Processing with a Stream Processing Engine. [Online].
http://www.streambase.com/print/knowledgecenter.htm
[41] Truviso Product Brief. [Online]. http://www.truviso.com/resources/
[42] M. Liu, L. Ding, E.A. Rundensteiner, and M. Mani M. Li, "Event Stream Processing
with Out-of-Order Data Arrival," , 2007.
[43] Y. Diao, D. Gyllstrom, and N. Immerman J. Agrawal, "Efficient pattern matching over
event streams," , 2008.
[44] T. Shafeek, "Aurora: A New Model And Architecture For Data Stream Management,
B.Tech Seminar Report, Government Engineering College, Thrissur," 2010.
[45] J. Widom , A. Arasu, R. Motwani, "Query Processing, Resource Management, and
Approximation in a Data Stream Management System," , 2003.
[46] OpenMRS Wiki. [Online].
https://wiki.openmrs.org/display/docs/Notifiable+Condition+Detector+%28NCD%29+
Module
[47] Oracle and BEA. (2011-5-6). [Online].
http://www.oracle.com/us/corporate/Acquisitions/bea/index.html
Appendix A
Apache License, Version 2.0
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction, and distribution as
defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is
granting the License.
"Legal Entity" shall mean the union of the acting entity and all other entities that control, are
controlled by, or are under common control with that entity. For the purposes of this
definition, "control" means (i) the power, direct or indirect, to cause the direction or
management of such entity, whether by contract or otherwise, or (ii) ownership of fifty
percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted
by this License.
"Source" form shall mean the preferred form for making modifications, including but not
limited to software source code, documentation source, and configuration files.
"Object" form shall mean any form resulting from mechanical transformation or translation
of a Source form, including but not limited to compiled object code, generated
documentation, and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or Object form, made available
under the License, as indicated by a copyright notice that is included in or attached to the
work (an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object form, that is based on
(or derived from) the Work and for which the editorial revisions, annotations, elaborations, or
other modifications represent, as a whole, an original work of authorship. For the purposes of
this License, Derivative Works shall not include works that remain separable from, or merely
link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including the original version of the Work
and any modifications or additions to that Work or Derivative Works thereof, that is
intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an
individual or Legal Entity authorized to submit on behalf of the copyright owner. For the
purposes of this definition, "submitted" means any form of electronic, verbal, or written
communication sent to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems, and issue tracking
systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and
improving the Work, but excluding communication that is conspicuously marked or
otherwise designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a
Contribution has been received by Licensor and subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of this License, each
Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royaltyfree, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly
display, publicly perform, sublicense, and distribute the Work and such Derivative Works in
Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of this License, each
Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royaltyfree, irrevocable (except as stated in this section) patent license to make, have made, use,
offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to
those patent claims licensable by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s) with the Work to which such
Contribution(s) was submitted. If You institute patent litigation against any entity (including
a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution
incorporated within the Work constitutes direct or contributory patent infringement, then any
patent licenses granted to You under this License for that Work shall terminate as of the date
such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works
thereof in any medium, with or without modifications, and in Source or Object form,
provided that You meet the following conditions:
You must give any other recipients of the Work or Derivative Works a copy of this
License; and
You must cause any modified files to carry prominent notices stating that You changed the
files; and
You must retain, in the Source form of any Derivative Works that You distribute, all
copyright, patent, trademark, and attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of the Derivative Works; and
If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative
Works that You distribute must include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not pertain to any part of the
Derivative Works, in at least one of the following places: within a NOTICE text file
distributed as part of the Derivative Works; within the Source form or documentation, if
provided along with the Derivative Works; or, within a display generated by the Derivative
Works, if and wherever such third-party notices normally appear. The contents of the
NOTICE file are for informational purposes only and do not modify the License. You may
add Your own attribution notices within Derivative Works that You distribute, alongside or
as an addendum to the NOTICE text from the Work, provided that such additional attribution
notices cannot be construed as modifying the License. You may add Your own copyright
statement to Your modifications and may provide additional or different license terms and
conditions for use, reproduction, or distribution of Your modifications, or for any such
Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work
otherwise complies with the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution
intentionally submitted for inclusion in the Work by You to the Licensor shall be under the
terms and conditions of this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify the terms of any
separate license agreement you may have executed with Licensor regarding such
Contributions.
6. Trademarks. This License does not grant permission to use the trade names, trademarks,
service marks, or product names of the Licensor, except as required for reasonable and
customary use in describing the origin of the Work and reproducing the content of the
NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing,
Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS"
BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions of TITLE, NONINFRINGEMENT,
MERCHANTABILITY,
or
FITNESS
FOR
A
PARTICULAR
PURPOSE. You are solely responsible for determining the appropriateness of using or
redistributing the Work and assume any risks associated with Your exercise of permissions
under this License.
8. Limitation of Liability. In no event and under no legal theory, whether in tort (including
negligence), contract, or otherwise, unless required by applicable law (such as deliberate and
grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for
damages, including any direct, indirect, special, incidental, or consequential damages of any
character arising as a result of this License or out of the use or inability to use the Work
(including but not limited to damages for loss of goodwill, work stoppage, computer failure
or malfunction, or any and all other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative
Works thereof, You may choose to offer, and charge a fee for, acceptance of support,
warranty, indemnity, or other liability obligations and/or rights consistent with this License.
However, in accepting such obligations, You may act only on Your own behalf and on Your
sole responsibility, not on behalf of any other Contributor, and only if You agree to
indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims
asserted against, such Contributor by reason of your accepting any such warranty or
additional liability.
Download