ICT, STREP FERARI ICT-FP7-619491 Flexible Event pRocessing for big dAta aRchItectures Collaborative Project D 6.3 Project Presentation 03.02.2014 – 30.04.2014 Contractual Date of Delivery: 30.04.2014 Actual Date of Delivery: 30.04.2014 Author(s): Michael Mock Institution: Poslovna Inteligencija d.o.o. Workpackage: WP6 Security: PU Nature: O Total number of pages: 37 Project coordinator name: Michael Mock Project coordinator organisation name: Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS) Schloss Birlinghoven, 53754 Sankt Augustin, Germany URL: http://www.iais.fraunhofer.de Revision: 1 Abstract: This document is the FERARI deliverable of WP6 for the first review period (03.02.2014 – 30.04.2014). The project presentation gives an overall overview of the FERARI project including the goals of the project, project partners and workpackage organization. Revision history Administration Status Project acronym: FERARI Document identifier: Leading Partner: Report version: Report preparation date: Classification: Nature: Author(s) and contributors: Status: Plan Draft Working Final x Submitted ID: ICT-FP7-619491 D 6.3 Project Presentation (03.02.2014 – 30.04.2014) Poslovna Inteligencija d.o.o. 1 10.04.2014 PU OTHER Michael Mock (FHG) Copyright This report is © FERARI Consortium 2014. Its duplication is restricted to the personal use within the consortium and the European Commission. www.ferari-project.eu Flexible Event pRocessing for big dAta aRchItectures (FERARI) Introduction 2 FERARI – A FP7 EC - ICT project Grant Agreement No. 619491 STREP Specific Targeted Research Project Grown out of FP7 basic research project LIFT (FET Open) FERARI was ranked 6th of 33 proposals within objective 4.2 Scalable Data Analysis • February 2014 – January 2017, Funding: 2.95 Mio. EUR 3 FERARI - Consortium Fraunhofer IAIS (FHG) IBM (IBM) Poslovna Inteligencija (PI) Technion (Technion) + Haifa University Technical University of Crete (TUC) T-Hrvatski Telekom (HT) 4 Fraunhofer IAIS: Intelligent Analysis and Information Systems „From sensor data to business intelligence, from media analysis to visual information systems: Our research allows companies to do more with data“ 270 people: scientists, project engineers, technical and administrative staff Located on Fraunhofer Campus Schloss Birlinghoven/Sankt Augustin near Bonn Joint research groups and cooperation with Institute Director: Prof. Dr. Stefan Wrobel Lead researcher: Dr. Michael Mock 5 Technical University of Crete Founded in 1977 in Chania, Crete 120 faculty members, ~175 adjunct faculty and lab personnel 2900 undergraduate and 550 graduate students Around 200 research programs, total budget 20.5 million ECE department: 25 faculty, ~200 undergrad students/year Research organized in 10 research laboratories SoftNet Lab (headed by Prof. Garofalakis): Focus on Big Data Analytics, Data Streams, Cloud Computing Lead researcher: Prof. Minos Garofalakis 6 TECHNION – Israel Institute for Technology and University of Haifa The Technion-Israel Institute of Technology is a major source of the innovation and brainpower that drives the Israeli economy, and a key to Israel’s reputation as the world’s “Start-Up Nation.” Its three Nobel Prize winners exemplify academic excellence. Located in Haifa, oldest University in Israel (1912) 600 Faculty Members (3 Nobel Laureates) Computer Science: 50 faculty members, 1500 Students Lead researcher: Prof. Assaf Schuster, head of Technion Computer Engineering Center, focus on Distributed and Scalable Data Mining, Monitoring Distributed Data Streams, Big Data Technologies and Analytics and Dr. Daniel Keren, Department of Computer Science at Haifa University 7 IBM Research – Haifa IBM Research is the innovation branch of IBM, the motto of IBM Research is “the world is our lab” 350 people: scientists, software engineers, subject matter experts Located in Haifa, Israel on the campus of Haifa University The largest IBM Research Lab outside the USA Lead researcher: Fabiana Fournier 8 T-Hrvatski Telekom: Communication, Information & Entertainment, Always & Everywhere “T-HT - to be the online company and to power the online society and digital economy in Croatia and the Region” T-HT Group is the leading provider of telecommunications services in Croatia and the sole company to offer the full range of these services: it combines the services of fixed and mobile telephony, data transmission, Internet and international communications T-HT’s strategy: GROW - COMPETE – TRANSFORM Key figures for 2012: Revenues: 991 mio EUR EBITDA margin: 45,3% 5780 employees Lead representative: Maja Vekić-Vedrina 9 Poslovna inteligencija: Leader in business intelligence „We provide our customers with the best possible service in strategic consultancy and in implementation of intelligent information systems for decision support, thereby helping them to create new values and identify new business opportunities.“ 90 employees - 90% project engineers, technical and business consultant, 10% sales and administration HQ in Zagreb, Croatia, offices in UK, Slovenia, Serbia, Bosnia and Herzegovina and Montenegro Extensive experience in Telecommunication industry and in R&D Big Data projects Lead representative: Dražen Oreščanin 10 Motivation A number of recent technological developments have started to change our world forever: • the rise of the internet • the ever growing amount of activities in social networks • the widespread adoption of smart phones and other mobile devices • the instrumentation of the world with sensors. This is accompanied by dropping prices for computers, networks, and storage 11 Objectives Provide support for large scale services by making the sensor layer a first class citizen in Big Data architectures. Provide support for Complex Event Processing technology for business users in Big Data architectures. Provide support for integrating machine learning tasks in the architecture. Provide support for flexible and adaptive analytics workflows. Exemplify the potential of the new architecture in the telecommunication and the cloud domain. 12 Use cases Monitoring a smart energy grid. Analysing the traffic state of a large city using car-to-car communication. Monitoring the quality of a telecommunication network. Detecting latent failures in a large cloud of thousands of machines. Inspecting potentially fraudulent credit card transactions in real-time and blocking these transactions when necessary. 13 Application Scenarios Mobile Phone Fraud Detection Detecting mobile phone fraud by analysing usage patterns Reliably detect mobile phone fraud Avoid financial losses due to fraud Scalability to millions of events /sec (for simple filtering), for more complex analysis less (depending on complexity of task) Cloud Health Monitoring Cloud data centre activity log monitoring Possibility to replace time-interval by event- based maintenance Avoiding service down-time 14 Negotiation Question: Data Size Quantity of data Average monthly number rated call details records is > 650 mio and total monthly quantity of data is > 300 GB. When it comes to raw call details, monthly quantities are significantly higher: number of records > 5500 mio and total size of data >10 TB. 15 Cloud services are one of the recently implemented services in Hrvatski Telekom. Number of cloud servers and customers using cloud services is still fairly low but numbers are rapidly increasing. Currently, the cloud consists of 6 machines which are producing a total amount of data of >40 GB per month. During the course of this project, we expect that the cloud might double its current size. FERARI success criteria The project’s success will be rigorously measured by the following validation criteria: Communication reduction with respect to global/state-of-the art solutions. A second quantitative validation criterion is processing time relative to the size of the data. A third criterion is – for monitoring applications – the number of false alarms Number of domains to which the approach can be deployed. A key to this is the variety aspect enabled by Distributed Complex Event Processing. Flexibility. The system will be designed such that it can adapt to new, unforeseen circumstances and can be easily consumable. 16 Workpackages 17 Work Plan Phase 1 (M1 – M12) - use case definition - component definition - architecture definition Phase 2 (M13 – M24) - Component refinement - First use case prototype implementation - First Architecture implementation Phase 3 (M25 – M34) will demonstrate and evaluate the impact of the methods developed in this project 18 Workpackage Structure WP1 – Use Cases WP4 – Flexible Event Processing WP3 – Communication Efficient, Low – Latency Methods WP5 – Robust Distributed Stream Monitoring WP2 - Architecture * WPs 6 and 7, which will interact with all WPs for dissemination and management tasks have been left out to increase readability. The general flow of dependencies is top-down from the use cases to the architecture and methodological work. Architecture and methods interact iteratively, since there are many technical and methodological dependencies. 19 FERARI - Workpackages provides Software Platform Communication efficient processing Complex event processing Stream processing Prototype 20 WP1: Application Scenarios, Test bed, Prototype Objectives: Selecting and defining the application scenarios fraud mining and cloud health monitoring Definition of testing & evaluation criteria for the end users at HT Setting up of a test bed both at HT and at the project partner’s local sites Implementation and evaluation of scenarios in a prototype to demonstrate the advantage of FERARI with respect to the state of the art as well as to demonstrate its business value 21 WP2: Big Data Streaming Architecture & Technology Integration Objectives: Define a Big Data architecture that makes the sensor layer a first class citizen of the architecture, Define a data and control flow that can implement a push based approach, so that processing can be partially done in situ, Provide methods for robust distributed stream processing including online machine learning Implement the architecture in as software platform (open source). 22 Architecture Diagram of FERARI Event processing deals with these functions: • get events from sources (event producers). • route these events, filter them, normalize or otherwise transform them, aggregate them, detect patterns over multiple events (event processing agents). • transfer events as alerts to a human or as a trigger to an autonomous adaptation system (event consumers). 23 WP2: Big Data Streaming Architecture & Technology Integration - TASKS The tangible output of WP2 will be the definition of the software big-data architecture allowing for the integration of components for complex event processing, in-situ processing and robust distributed stream processing including online machine learning. In addition, the architecture will be provided as software platform. 24 Interdependencies between WP1 & WP2 WP2: Software Platform Open source General purpose for communication efficient big-data stream analysis alg. Flexible event processing Components as libraries Interfaces to plugin concrete algorithms (learning, monitoring) In stream learning CEP Language Software Platform Plugin concrete algorithms Prototype 25 WP3: Communication Efficient, Low-Latency Methods Objectives: develop in-situ processing methods that go beyond current methods develop new algorithms that are able to efficiently detect granular events identify and explore the right level of in-situ processing for scalability issues 26 In-Situ Processing (LIFT) Coordinator Monitors Global Treshold Global Condition/ Reference Point nodes Resolution protocol (after violation) Alarm message only if local Safe Zone is violated (example: all nodes of a cloud work “in healthy” state) Sensors monitor local Safe-Zone in situe Local Condition Safe - Zone 27 WP4: Flexible Event Processing Objectives: develop a Complex Event Processing model and methodology suitable for specification, implementation, and maintenance of event-driven applications Providing semantics for specifying event patterns Providing a end-user consumable framework for flexibly specifying event processing systems Providing modules for generation of an event processing network implementation and optimization plan that allows distributed in situ monitoring of complex event patterns 28 WP5: Robust Distributed Stream Monitoring Objectives: develop methods for robust distributed stream monitoring exploit online machine learning methods to adapt the FERARI data/control flow to unforeseen circumstances Provide support for integrating machine learning into the architecture. Accounting for uncertainty in the architecture 29 Simple LIFT Example Mobility Monitoring using stationary sensors Each sensor computes a (linear counting) sketch of bluetooth addresses in sensor rage Sketch is a bit-array of fixed length Provide set of mobility mining primitives count distinct union intersection Si Sj 30 Sj ) sk(R sk( R Si ) Coordinator WP6: Dissemination & Exploitation Objectives: Disseminating the FERARI theoretic framework to the scientific community of data mining and distributed systems. Outlining the methodological and technical superiority of the proposed solution compared to other approaches to distributed monitoring Dissemination to high-profile early adaptors within the scope of the application scenarios 31 WP7: Coordination Objectives: Establishment of a strong project management scheme Successful achievement of the project objectives on time and within budget Generation of synergies amongst the project members Continuous monitoring of the project’s progress and timely initiation of corrective actions (if needed) Coordination of the continuous process aiming to transfer the knowledge generated to the relevant scientific communities 32 List of Deliverables Deliver able No 1.1 1.2 1.3 2.1 2.2 2.3 3.1 3.2 3.3 4.1 4.2 4.3 Deliverable name WP No. Nature Dissemination level Due date Application Scenario Description and Requirement Analysis Final Application Scenarios and Description of Test Environment Application Scenario & Prototype Report 1 R PU M12 1 R PU M24 1 R PU M36 Architecture definition System Prototype Final Prototype 2 R PU M12 2 R PU M24 2 R PU Requirements and state of the art overview on in situ methods Development of algorithms based on in-situ, low-latency Methods Implementation and evaluation of in-situ, low latency Algorithms Requirements and state of the art overview on Flexible Event Processing Goal driven model and methodology for specification of event processing Applications Automatic generation of annotated event Processing network from the goal-driven Model 3 R PU 3 R 3 R Deliver able No Deliverable name WP No. Nature Dissemination level Due date Requirements and state of the Art overview on Robust Stream Monitoring Algorithms for Robust Distributed Stream Monitoring and Supporting Data Integrity Implementation of Algorithms for Robust Distributed Stream Monitoring and Supporting data Integrity 6 R PU M12 6 R PU M24 6 R PU M36 6.1 Project Fact Sheet 6 O PU M3 M36 6.2 Project Web Site 6 O PU M3 M12 6.3 Project Presentation 6 O PU M3 PU M24 6.4 Project Workshop, Seminar and Training Course 6 R PU M30 PU M36 6.5 First Draft of Exploitation Plan 6 R CO M24 6.6 Exploitation and Dissemination Plan 6 R CO M36 7.1 Quality Assurance Plan 7 R PU M6 5.1 5.2 5.3 4 R PU M12 4 R PU M24 7.2 1st Annual Project Report 7 R CO M12 4 R PU M36 7.3 2nd Annual Project Report 7 R CO M24 7.4 Final Project Report 7 R CO M36 Each WP-Leader is responsible for the deliverables of his or her WP – more details in the 33 Summary The goal of the FERARI project is to pave the way for efficient, real-time Big Data technologies of the future. It will enable business users to express complex analytics tasks through a high-level declarative language that supports distributed Complex Event Processing and sophisticated machine learning operators as an integral part of the system architecture. Effective, real-time execution at scale will be achieved by making the sensor layer a first-class citizen in distributed streaming architectures and leveraging in-situ data processing as a first (and, in the long run, the only realistic) choice for realizing planetary-scale Big Data systems. 34 http://www.ferari-project.eu 35