ICT, FET Open LIFT ICT-FP7-255951 Using Local Inference in Massively Distributed Systems Collaborative Project D 6.4 Project Workshop Contractual Date of Delivery: 31.03.2013 Actual Date of Delivery: 08.05.2013 Author(s): Minos Garofalakis (TUC), Antonis Deligiannakis (TUC), Michael May (FHG), Christine Kopp (FHG) , Michael Kamp (FHG) Institution: FHG Workpackage: WP 6 Security: PU Nature: R Total number of pages: 6 Project coordinator name: Michael May Project coordinator organisation name: Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS) Schloss Birlinghoven, 53754 Sankt Augustin, Germany URL: http://www.iais.fraunhofer.de Revision: 2 Abstract: This document is the LIFT deliverable of WP6 for the third review period (01.10.2012 – 30.09.2013). The document contains an overview of two projectrelated workshops to be held in August 2013. Revision history Administration Status Project acronym: LIFT Document identifier: Leading Partner: Report version: Report preparation date: Classification: Nature: Author(s) and contributors: Status: x ID: ICT-FP7-255951 D 6.4 Project Workshop (01.10.2012 – 30.09.2013) FHG 2 (workshop program and details added) 10.12.2013 PU REPORT Minos Garofalakis (TUC), Antonis Deligiannakis (TUC), Michael May (FHG), Christine Kopp (FHG), Michael Kamp (FHG) Plan Draft Working Final Submitted Copyright This report is © LIFT Consortium 2013. Its duplication is restricted to the personal use within the consortium and the European Commission. www.lift-eu.org ii D6.4 – Project Workshop Minos Garofalakis, Antonis Deligiannakis, Michael May, Christine Kopp 1 Introduction In year three of LIFT two project-related workshops will take place that are colocated with major scientific conferences in the area of data management and data mining. The first workshop, Big Dynamic Distributed Data, is co-located with VLDB 2013 and is organized by TUC. It focuses on the core topic of LIFT, namely complex computations over partitions of massive streaming data. The second workshop, Ubiquitous Data Mining, is jointly organized by João Gama and FHG. It is already the third workshop in its series and has in 2013 a special focus un distributed streaming data. 2 First International Workshop on Big Dynamic Distributed Data (BD³ 2013) Date and Location August 30th, 2013, Trento, Italy (in conjunction with VLDB 2013) Chairs General Chairs: Program Chairs: Minos Garofalakis, Antonios Deligiannakis Graham Cormode, Assaf Schuster, Ke Yi Website http://www.softnet.tuc.gr/bd3 Program 08:30 - 09:00 Session 1: Invited Talk Title: Streaming Balanced Partitioning of Massive Scale Graphs Speaker: Milan Vojnovic, MSR Cambridge 09:00 - 10:00 Session 1: Keynote Speech Title: Coding Theory for Large-Scale Storage Speaker: Alex Dimakis, UT Austin 10:00 - 10:30 Coffee break 10:30 - 12:00 Session 2: Distributed Monitoring Safe-Zones for Monitoring Distributed Streams Daniel Keren (Haifa), Guy Sagy (Technion), Amir Abboud (Technion), Izchak Sharfman (Technion), Assaf Schuster (Technion), David Ben-David (Technion) 2 Communication-Efficient Distributed Online Prediction using Dynamic Model Synchronizations Mario Boley (Fraunhofer IAIS), Izchak Sharfman (Technion), Daniel Keren (Haifa), Michael Kamp (Fraunhofer IAIS), Assaf Schuster (Technion) Communication-efficient Outlier Detection for Scale-out Systems Moshe Gabel (Technion), Daniel Keren (Haifa), Assaf Schuster (Technion) 12:00 - 13:30 Lunch 13:30 - 15:30 Session 3: CEP and Graphs Elastic Complex Event Processing under Varying Query Load Thomas Heinze (SAP AG), Yuanzhen Ji (SAP AG), Yinying Pan (SAP AG), Franz Josef Grüneberger (SAP AG), Zbigniew Jerzak (SAP AG), Christof Fetzer (TU Dresden) Adaptive Selective Replication for Complex Event Processing Systems Franz Josef Grüneberger (SAP AG), Thomas Heinze (SAP AG), Pascal Felber (Universite de Neuchatel) Dynamic Partitioning of Big Hierarchical Graphs Vasilis Spyropoulos (Athens University of Economics and Business), Yannis Kotidis (Athens University of Economics and Business) Scalable and Robust Management of Dynamic Graph Data Alan Labouseur (SUNY -- Albany), Paul Olsen (SUNY -- Albany), Jeong-Hyon Hwang (SUNY - Albany) 15:30 - 16:00 Coffee break 16:00 - 18:00 Session 4: Stream Processing Towards Elastic Stream Processing: Patterns and Infrastructure Kai-Uwe Sattler (TU Ilmenau), Felix Beier (TU Ilmenau) Task Graphs of Stream Mining Algorithms Sayaka Akioka (Meiji University) Large-scale Online Mobility Monitoring with Exponential Histograms Christine Kopp (Fraunhofer IAIS), Michael Mock (Fraunhofer IAIS), Odysseas Papapetrou (Technical Univ. of Crete), Michael May (Fraunhofer IAIS) Multi-Stage Malicious Click Detection on Large Scale Web Advertising Data Leyi Song (East China Normal University), Xueqing Gong (East China Normal University), Xiaofeng He (East China Normal University), Rong Zhang (East China Normal University), Aoying Zhou (East China Normal University) 18:00 - 18:10 Closing Remarks 3 Workshop Proceedings http://ceur-ws.org/Vol-1018/ Workshop Description As the amount of streaming data produced by large-scale systems such as environmental monitoring, scientific experiments and communication networks grows rapidly, new approaches are needed to effectively process and analyze such data. There are several promising directions in the area of large-scale distributed computation, that is, where multiple computing entities work together over partitions of the massive, streaming data to perform complex computations. Two important paradigms in this realm are continuous distributed monitoring (i.e., continually maintaining an accurate estimate of a complex query), and distributed and cluster-based systems that allow the processing of big, streaming data (e.g., IBM System S, Apache S4, and Twitter Storm). The aim of the BD3 workshop is to bring together computer scientists with interests in this field to present recent innovations, find topics of common interest and to stimulate further development of new approaches to deal with massive dynamic and distributed data. Topics of interest include (but are not limited to): Novel architectures for BD3 Extensions to existing models for BD3 Algorithms for mining and analytics for BD3 Query processing in BD3 Efficient communication protocols for BD3 Languages and structures for BD3 Theoretical basis and hardness for BD3 Engineering case-studies in BD3 Position papers on challenges and new directions in BD3 Privacy issues in BD3 Energy efficiency and reliability in BD3 Scheduling and provisioning issues in BD3 The 1st International Workshop on Big, Dynamic, Distributed Data (BD3) took place on Friday 30/8, in conjunction with VLDB'2013 in Riva Del Garda, Italy. (VLDB is the top international research forum on data management systems.) The workshop received 15 submissions out of which 11 papers were selected for presentation. Four of the accepted papers were, in fact, by LIFT authors, so LIFT work took center stage at the workshop. Professor Alex Dimakis from UT-Austin delivered a very interesting keynote talk on novel applications of coding theory in robust distributed storage. VLDB handled workshop registrations centrally, so people registered for all workshops on that day - there were 8 concurrent workshops taking place on 30/8. The BD3 workshop was very well attended, with 30-40 participants in the workshop throughout the day, and several interesting technical discussions taking place both during the sessions and the coffee breaks. 4 3 Workshop on Ubiquitous Data Mining (UDM 2013) Date and Location August 3rd-5th, 2013, Beijing, China (in conjunction with IJCAI 2013) Chairs João Gama, Michael May, Nuno Marques, Paulo Cortez Website http://www.liaad.up.pt/udm Program 09:00 -10:00 Invited Talk Title: Exploiting Label Relationship in Multi-Label Learning Speaker: Zhi-Hua Zhou 10:00 – 12:25 Paper Session 1 Predicting Globally and Locally: A Comparison of Methods for Vehicle Trajectory William Groves, Ernesto Nunes and Maria Gini On Recommending Urban Hotspots to Find Our Next Passenger Luis Moreira-Matias, Ricardo Fernandes, Joao Gama, Michel Ferreira, João Mendes-Moreira and Luis Damas Road-quality classification and bump detection smartphones Marius Hoffmann, Michael Mock and Michael May with bycicle-mounted Cicerone: Design of a Real-Time Area Knowledge-Enhanced Venue Recommender Daniel Villatoro, Jordi Aranda, Marc Planagumà, Rafael Gimenez and Marc Torrent-Moreno Visual Scenes Clustering Using Variational Incremental Learning of Infinite Generalized Dirichlet Mixture Wentao Fan and Nizar Bouguila 12:30 – 13:30 Lunch 13:30 – 14:30 Invited Talk Title: NIM: Scalable Distributed Stream Processing System on Mobile Network Data Speaker: Wei Fan 14:30 – 17:00 Paper Session 2 5 Learning Model Rules from High-Speed Data Streams Ezilda Almeida, Carlos A. Ferreira and João Gama Simultaneous segmentation and recognition of gestures for human-machine interaction Harold Vasquez, Luis Enrique Sucar and Hugo Jair Escalante Ubiquitous Self-Organizing Maps Bruno Silva and Nuno C. Marques Simulating Price Interactions by Mining Multivariate Financial Time Series Bruno Silva, Luis Cavique and Nuno C. Marques Trend template: mining trends with a semi-formal trend model Olga Streibel 17:00 Closing Workshop Proceedings http://www.liaad.up.pt/udm/workshop-proceedings/at_download/file Workshop Description Two major technological environment: evolutions modify our relationship with our Widely available and cheap computer power. Simple objects that surround us are gaining sensors, computational power, and actuators, and are changing from static, into adaptive and reactive systems. The explosion of networks of all kinds offers new possibilities for the development and self-organization of communities. The new characteristics of data reflect a World in Movement: Time and space. The objects of analysis exist in time and space. Often they are able to move. Dynamic environment. These objects exist in a dynamic and unstable environment, evolving incrementally over time. Information processing capability. The objects are endowed with information processing capabilities. Locality. The objects never see the global picture - they know only their local spatio-temporal environment. Real-Time. The models have to evolve incrementally in correspondence with the evolving environment. Distributed. The object will be able to exchange information with other objects, thus forming a truly distributed environment. Ubiquitous Data Mining (UDM) uses Data Mining techniques to extract useful knowledge from data with these characteristics. The goal of this workshop is to convene researchers (from both academia and industry) who deal with 6 techniques such as: decision rules, decision trees, association rules, clustering, filtering, learning classifier systems neural networks, support vector machines, preprocessing, post processing, feature selection, visualization techniques, etc. for UDM and related themes. Topics include but are not restricted to: Adaptive Data Mining Distributed Data Mining Distributed Data Streams Grid Data Mining Learning in Ubiquitous environments Learning from Sensor Networks Learning from Social Networks Visualization Techniques for UDM Incremental On-line Learning Algorithms Single-Pass and Scalable Algorithms Learning in distributed neural network systems; Real-Time and Real-World Applications Resource-aware UDM Theoretical frameworks for UDM The Workshop on Ubiquitous Data Mining took place on Saturday 3/8, in conjunction with IJCAI’2013 in Beijing, China. The workshop received 16 submissions out of which 8 papers were selected for presentation by a scientific committee. The UDM workshop was well attended, with around 25 participants in the workshop throughout the day, and several interesting technical discussions taking place both during the sessions and breaks. 7