Project Presentation

advertisement
ICT, STREP
FERARI ICT-FP7-619491
Flexible Event pRocessing for big dAta aRchItectures
Collaborative Project
D 6.3
Project Presentation
03.02.2014 – 30.04.2014
Contractual Date of Delivery:
30.04.2014
Actual Date of Delivery:
30.04.2014
Author(s):
Michael Mock
Institution:
Poslovna Inteligencija d.o.o.
Workpackage:
WP6
Security:
PU
Nature:
O
Total number of pages:
37
Project coordinator name: Michael Mock
Project coordinator organisation name:
Fraunhofer Institute for Intelligent Analysis
and Information Systems (IAIS)
Schloss Birlinghoven, 53754 Sankt Augustin, Germany
URL: http://www.iais.fraunhofer.de
Revision: 1
Abstract:
This document is the FERARI deliverable of WP6 for the first review period
(03.02.2014 – 30.04.2014). The project presentation gives an overall overview
of the FERARI project including the goals of the project, project partners and
workpackage organization.
Revision history
Administration Status
Project acronym: FERARI
Document identifier:
Leading Partner:
Report version:
Report preparation date:
Classification:
Nature:
Author(s) and contributors:
Status:
Plan
Draft
Working
Final
x
Submitted
ID: ICT-FP7-619491
D 6.3 Project Presentation (03.02.2014 – 30.04.2014)
Poslovna Inteligencija d.o.o.
1
10.04.2014
PU
OTHER
Michael Mock (FHG)
Copyright
This report is © FERARI Consortium 2014. Its duplication is restricted to the personal use
within the consortium and the European Commission.
www.ferari-project.eu
Flexible Event
pRocessing for big dAta
aRchItectures
(FERARI)
Introduction
2
FERARI – A FP7 EC - ICT project




Grant Agreement No. 619491
STREP Specific Targeted Research Project
Grown out of FP7 basic research project LIFT (FET Open)
FERARI was ranked 6th of 33 proposals within objective 4.2
Scalable Data Analysis
• February 2014 – January 2017, Funding: 2.95 Mio. EUR
3
FERARI - Consortium
 Fraunhofer IAIS (FHG)
 IBM (IBM)
 Poslovna Inteligencija (PI)
 Technion
(Technion) +
Haifa University
 Technical
University
of Crete
(TUC)
 T-Hrvatski Telekom (HT)
4
Fraunhofer IAIS: Intelligent Analysis and Information
Systems
„From sensor data to business intelligence, from media
analysis to visual information systems: Our research
allows companies to do more with data“
 270 people: scientists, project engineers, technical and
administrative staff
 Located on Fraunhofer Campus Schloss
Birlinghoven/Sankt Augustin near Bonn
 Joint research groups and cooperation with
Institute Director: Prof. Dr. Stefan Wrobel
 Lead researcher: Dr. Michael Mock
5
Technical University of Crete
 Founded in 1977 in Chania, Crete
 120 faculty members, ~175 adjunct faculty and lab personnel
 2900 undergraduate and 550 graduate students
 Around 200 research programs, total budget 20.5 million
 ECE department: 25 faculty, ~200 undergrad students/year
 Research organized in 10 research laboratories
 SoftNet Lab (headed by Prof. Garofalakis): Focus on Big Data Analytics, Data
Streams, Cloud Computing
 Lead researcher: Prof. Minos Garofalakis
6
TECHNION – Israel Institute for Technology
and University of Haifa
The Technion-Israel Institute of Technology is a major source of the innovation and brainpower that
drives the Israeli economy, and a key to Israel’s reputation as the world’s “Start-Up Nation.” Its
three Nobel Prize winners exemplify academic excellence.
 Located in Haifa, oldest University in Israel (1912)
 600 Faculty Members (3 Nobel Laureates)
 Computer Science: 50 faculty members, 1500 Students
 Lead researcher: Prof. Assaf Schuster, head of Technion
Computer Engineering Center, focus on Distributed and
Scalable Data Mining, Monitoring Distributed Data
Streams, Big Data Technologies and Analytics
and Dr. Daniel Keren, Department of Computer Science
at Haifa University
7
IBM Research – Haifa
IBM Research is the innovation branch of IBM, the motto of
IBM Research is “the world is our lab”
 350 people: scientists, software engineers, subject
matter experts
 Located in Haifa, Israel on the campus of Haifa
University
 The largest IBM Research Lab outside the USA
 Lead researcher: Fabiana Fournier
8
T-Hrvatski Telekom: Communication, Information &
Entertainment, Always & Everywhere
“T-HT - to be the online company and to power the online
society and digital economy in Croatia and the Region”
 T-HT Group is the leading provider of telecommunications
services in Croatia and the sole company to offer the full
range of these services: it combines the services of fixed
and mobile telephony, data transmission, Internet and
international communications
 T-HT’s strategy: GROW - COMPETE – TRANSFORM
 Key figures for 2012:
 Revenues: 991 mio EUR
 EBITDA margin: 45,3%
 5780 employees
 Lead representative: Maja Vekić-Vedrina
9
Poslovna inteligencija:
Leader in business intelligence
„We provide our customers with the best possible service in strategic consultancy and in
implementation of intelligent information systems for decision support, thereby helping them to
create new values and identify new business opportunities.“
 90 employees - 90% project engineers, technical and
business consultant, 10% sales and administration
 HQ in Zagreb, Croatia, offices in UK, Slovenia, Serbia,
Bosnia and Herzegovina and Montenegro
 Extensive experience in Telecommunication industry
and in R&D Big Data projects
 Lead representative: Dražen Oreščanin
10
Motivation
 A number of recent technological developments have started to
change our world forever:
• the rise of the internet
• the ever growing amount of activities in social networks
• the widespread adoption of smart phones and other mobile
devices
• the instrumentation of the world with sensors. This is
accompanied by dropping prices for computers, networks, and
storage
11
Objectives
 Provide support for large scale services by making the sensor
layer a first class citizen in Big Data architectures.
 Provide support for Complex Event Processing technology for
business users in Big Data architectures.
 Provide support for integrating machine learning tasks in the
architecture.
 Provide support for flexible and adaptive analytics workflows.
 Exemplify the potential of the new architecture in the
telecommunication and the cloud domain.
12
Use cases
 Monitoring a smart energy grid.
 Analysing the traffic state of a large city using car-to-car
communication.
 Monitoring the quality of a telecommunication network.
 Detecting latent failures in a large cloud of thousands of
machines.
 Inspecting potentially fraudulent credit card transactions in
real-time and blocking these transactions when necessary.
13
Application Scenarios
 Mobile Phone Fraud Detection




Detecting mobile phone fraud by analysing usage patterns
Reliably detect mobile phone fraud
Avoid financial losses due to fraud
Scalability to millions of events /sec (for simple filtering), for more complex analysis less
(depending on complexity of task)
 Cloud Health Monitoring
 Cloud data centre activity log monitoring
 Possibility to replace time-interval by event- based maintenance
 Avoiding service down-time
14
Negotiation Question: Data Size
Quantity of
data
Average monthly number rated call details
records is > 650 mio and total monthly
quantity of data is > 300 GB.
When it comes to raw call details, monthly
quantities are significantly higher: number of
records > 5500 mio and total size of data >10
TB.
15
Cloud services are one of the
recently implemented services in
Hrvatski Telekom. Number of cloud
servers and customers using cloud
services is still fairly low but
numbers are rapidly increasing.
Currently, the cloud consists of 6
machines which are producing a
total amount of data of >40 GB per
month.
During the course of this project,
we expect that the cloud might
double its current size.
FERARI success criteria
 The project’s success will be rigorously measured by the following
validation criteria:
 Communication reduction with respect to global/state-of-the art
solutions.
 A second quantitative validation criterion is processing time relative to
the size of the data.
 A third criterion is – for monitoring applications – the number of false
alarms
 Number of domains to which the approach can be deployed. A key to
this is the variety aspect enabled by Distributed Complex Event
Processing.
 Flexibility. The system will be designed such that it can adapt to new,
unforeseen circumstances and can be easily consumable.
16
Workpackages
17
Work Plan
Phase 1 (M1 – M12)
- use case definition
- component definition
- architecture definition
Phase 2 (M13 – M24)
- Component refinement
- First use case prototype
implementation
- First Architecture
implementation
Phase 3 (M25 – M34) will
demonstrate and evaluate the
impact of the methods developed
in this project
18
Workpackage Structure
WP1 – Use Cases
WP4 –
Flexible Event
Processing
WP3 –
Communication
Efficient, Low –
Latency Methods
WP5 – Robust
Distributed
Stream
Monitoring
WP2 - Architecture
* WPs 6 and 7, which will interact with all WPs for dissemination and management tasks have been left out to increase readability. The
general flow of dependencies is top-down from the use cases to the architecture and methodological work. Architecture and methods
interact iteratively, since there are many technical and methodological dependencies.
19
FERARI - Workpackages
provides
Software
Platform
Communication
efficient
processing
Complex event
processing
Stream
processing
Prototype
20
WP1: Application Scenarios, Test bed, Prototype
 Objectives:
 Selecting and defining the application scenarios fraud mining and
cloud health monitoring
 Definition of testing & evaluation criteria for the end users at HT
 Setting up of a test bed both at HT and at the project partner’s
local sites
 Implementation and evaluation of scenarios in a prototype to
demonstrate the advantage of FERARI with respect to the state of
the art as well as to demonstrate its business value
21
WP2: Big Data Streaming Architecture &
Technology Integration
 Objectives:
 Define a Big Data architecture that makes the sensor layer a first
class citizen of the architecture,
 Define a data and control flow that can implement a push based
approach, so that processing can be partially done in situ,
 Provide methods for robust distributed stream processing
including online machine learning
 Implement the architecture in as software platform (open source).
22
Architecture Diagram of FERARI
 Event processing deals with these functions:
• get events from sources (event producers).
• route these events, filter them, normalize or otherwise transform them,
aggregate them, detect patterns over multiple events (event processing agents).
• transfer events as alerts to a human or as a trigger to an autonomous adaptation
system (event consumers).
23
WP2: Big Data Streaming Architecture &
Technology Integration - TASKS
The tangible output of WP2 will be the definition of the software big-data
architecture allowing for the integration of components for complex event
processing, in-situ processing and robust distributed stream processing
including online machine learning. In addition, the architecture will be
provided as software platform.
24
Interdependencies between WP1 & WP2
 WP2: Software
Platform
 Open source
 General purpose for
communication efficient
big-data stream analysis alg.
 Flexible event processing
 Components as libraries
 Interfaces to plugin
concrete algorithms
(learning, monitoring)
 In stream learning
 CEP Language
Software
Platform
Plugin
concrete
algorithms
Prototype
25
WP3: Communication Efficient, Low-Latency
Methods
 Objectives:
 develop in-situ processing methods that go beyond current
methods
 develop new algorithms that are able to efficiently detect granular
events
 identify and explore the right level of in-situ processing for
scalability issues
26
In-Situ Processing (LIFT)
Coordinator
Monitors
Global Treshold
Global Condition/
Reference Point
nodes
Resolution
protocol
(after violation)
Alarm message
only if local Safe
Zone is violated
(example:
all nodes of a cloud
work “in healthy” state)
Sensors
monitor local
Safe-Zone in situe
Local Condition
Safe - Zone
27
WP4: Flexible Event Processing
 Objectives:
 develop a Complex Event Processing model and methodology
suitable for specification, implementation, and maintenance of
event-driven applications
 Providing semantics for specifying event patterns
 Providing a end-user consumable framework for flexibly specifying
event processing systems
 Providing modules for generation of an event processing network
implementation and optimization plan that allows distributed in
situ monitoring of complex event patterns
28
WP5: Robust Distributed Stream Monitoring
 Objectives:
 develop methods for robust distributed stream monitoring
 exploit online machine learning methods to adapt the FERARI
data/control flow to unforeseen circumstances
 Provide support for integrating machine learning into the
architecture.
 Accounting for uncertainty in the architecture
29
Simple LIFT Example
 Mobility Monitoring using stationary
sensors
 Each sensor computes a (linear
counting) sketch of bluetooth
addresses in sensor rage
 Sketch is a bit-array of fixed length
 Provide set of mobility mining
primitives
 count distinct
 union
 intersection
Si
Sj
30
Sj )
sk(R
sk(
R
Si )
Coordinator
WP6: Dissemination & Exploitation
 Objectives:
 Disseminating the FERARI theoretic framework to the scientific
community of data mining and distributed systems.
 Outlining the methodological and technical superiority of the
proposed solution compared to other approaches to distributed
monitoring
 Dissemination to high-profile early adaptors within the scope of
the application scenarios
31
WP7: Coordination
 Objectives:
 Establishment of a strong project management scheme
 Successful achievement of the project objectives on time and
within budget
 Generation of synergies amongst the project members
 Continuous monitoring of the project’s progress and timely
initiation of corrective actions (if needed)
 Coordination of the continuous process aiming to transfer the
knowledge generated to the relevant scientific communities
32
List of Deliverables
Deliver
able
No
1.1
1.2
1.3
2.1
2.2
2.3
3.1
3.2
3.3
4.1
4.2
4.3
Deliverable name
WP
No.
Nature
Dissemination
level
Due
date
Application Scenario Description and Requirement
Analysis
Final Application Scenarios and Description
of Test Environment
Application Scenario & Prototype Report
1
R
PU
M12
1
R
PU
M24
1
R
PU
M36
Architecture
definition
System
Prototype
Final Prototype
2
R
PU
M12
2
R
PU
M24
2
R
PU
Requirements and state of the art overview on in
situ methods
Development of algorithms based on in-situ,
low-latency Methods
Implementation and evaluation of in-situ, low
latency Algorithms
Requirements and state of the art overview on
Flexible Event Processing
Goal driven model and methodology for
specification of event processing Applications
Automatic generation of annotated event
Processing network from the goal-driven
Model
3
R
PU
3
R
3
R
Deliver
able
No
Deliverable name
WP
No.
Nature
Dissemination
level
Due
date
Requirements and state of the Art overview on
Robust Stream Monitoring
Algorithms for Robust Distributed Stream
Monitoring and Supporting Data Integrity
Implementation of Algorithms for Robust
Distributed Stream Monitoring and Supporting data
Integrity
6
R
PU
M12
6
R
PU
M24
6
R
PU
M36
6.1
Project Fact Sheet
6
O
PU
M3
M36
6.2
Project Web Site
6
O
PU
M3
M12
6.3
Project Presentation
6
O
PU
M3
PU
M24
6.4
Project Workshop, Seminar and Training Course
6
R
PU
M30
PU
M36
6.5
First Draft of Exploitation Plan
6
R
CO
M24
6.6
Exploitation and Dissemination Plan
6
R
CO
M36
7.1
Quality Assurance Plan
7
R
PU
M6
5.1
5.2
5.3
4
R
PU
M12
4
R
PU
M24
7.2
1st Annual Project Report
7
R
CO
M12
4
R
PU
M36
7.3
2nd Annual Project Report
7
R
CO
M24
7.4
Final Project Report
7
R
CO
M36
Each WP-Leader is responsible for the deliverables of his or her
WP – more details in the
33
Summary
 The goal of the FERARI project is to pave the way for efficient, real-time Big
Data technologies of the future.
 It will enable business users to express complex analytics tasks through a
high-level declarative language that supports distributed Complex Event
Processing and sophisticated machine learning operators as an integral part
of the system architecture.
 Effective, real-time execution at scale will be achieved by making the sensor
layer a first-class citizen in distributed streaming architectures and
leveraging in-situ data processing as a first (and, in the long run, the only
realistic) choice for realizing planetary-scale Big Data systems.
34
http://www.ferari-project.eu
35
Download