WP3: Real-time Large-scale Data Analysis

WP3: Real-time Large-scale Data Analysis

This WP aims at contributing towards building an integrated platform for semantic urban computing by developing tools for large-scale data analysis in real-time. The targeted tools implement low-level analysis algorithms that together with the outcomes of WP4 and WP5 will encompass the technology needed to realise the

SERENDIPITI software framework. Specifically, multimodal analysis will be performed on the output of physical sensors like CCTV cameras combined with available information extracted from on-line and social communities as blogs; twitter;

Flickr; youtube; etc. According to the use case described earlier, for effective navigation of a user around a new place, it is critical to provide the user with “realtime” intelligence on varied aspects of an environment such as traffic and weather conditions to name a few. In addition, interpreting the real-time changes it is also critical to provide the users with alternate solutions which closely correspond to

“human cognition”. To achieve this ambitious objective, several innovative research methodologies extending beyond state-of-the-art will be developed. The general objectives of this WP are listed below:

 large-scale information discovery from online resources and physical environments in real-time.

 real-time multimodal data analysis from varied information sources such as real world (e.g. CCTV) and online resources (e.g. blogs, youtube)

 multimodal data synchronisation of information from physical and online worlds

 to extract knowledge regarding an environment (e.g. a city such as London/Milan,

Dublin, etc) by analysing and indexing cartographs.

The information provided by this WP will be used in WP4 and WP5. Using results of

WP3, analysis algorithms, suitable descriptors and other appropriate description models will be developed. Potential description schemes will be proposed to the appropriate standardisation bodies within WP7 “Spreading excellence”.

The WP consists of the following integrative activities and will be coordinated by

QMUL.

A3.1 Distributed intelligent sensing

Leader: QMUL, Participants:

This activity is dedicated to gather and filter all information measured by a dense multimodal sensor network and available information extracted from on-line and social communities as blogs; twitter; Flickr; youtube. The sensor network covers a wide-range of modalities, like CCTV cameras, simple state/change indicators taking a binary (on/off) value, e.g. door open/shut in large indoor city scenarios, measurements of continues variables, e.g. temperature, noise level and complex sensors from metereologic stations when available.

A3.1 will deal with the system intelligence in its most elementary form. The approach is hierarchical as in the overall system. Analysis is first confined to specific amounts of data, e.g., the data sensed at the moment when the analysis is conducted. Then the resulting information is combined with information from the past to build a second

hierarchical analysis level. Finally, distributed information from several sites is put together to infer the output information of the distributed sensing subsystem.

Form a technical point of view, the main objective of this activity is to devise and apply techniques to transform raw data, in the form of multiple time series and on-line sources into meaningful information using a hierarchical approach. For single sensors this will simply be a binary state variable as a function of time. For more complex sensors, such as video and audio recorders, this could be a hypothesis identifying a highlight happening at space-time coordinates.

A3.2 – Cartograph Analysis of a City


This task focuses on the analysis of city maps and generates an index of “interesting aspects” of a city. The interesting aspects could vary from a street to a national monument and this information will be used a pre-knowledge to the SERENDIPITY platform. The cartograph analysis will exploit the availability of GPS information for precisely (within few meters, e.g Google Latitude provides 3m precision in motion and 500m precision when resting) localising the movement of users within the city.

This task consists of the following sub-tasks.



A3.2.1 – Indexing cartograph



A3.2.2 – Mapping GPS from user to the Cartograph



A3.2.3 – Real-time tracking of user movement

A3.3 – Traffic and Crowd Monitoring from Real-Time Multimedia sources


In this task, research methodologies for detecting “semantic events” such as traffic jams, accidents, monitoring crowd behaviour will be developed. These tools will enable machines to mimic human like cognition in predicting the future consequences of a specific event by continuously monitoring past events. The presence of CCTV in city will be used, as a primary source of information for the real-time extraction of semantic events. The task consists of following sub-tasks



A3.3.1 – Tracking and Tracing



A3.3.2 – Traffic monitoring for semantic events



A3.3.3 – Crowd behaviour monitoring



A3.3.4 – Real-time reasoning following a semantic event



A3.3.5 – Interpretation of consequences of a semantic event

A3.4: Sequential, anytime and on-line learning for Real-time semantic analysis


This activity deals with algorithmic issues central to machine learning and knowledge discovery. One objective is to provide a coherent perspective on resource constrained algorithms that are fundamentally designed to handle limited bandwidth, limited computing and storage capabilities, limited battery power, and specific real-time network-communication protocols. In the targeted application scenario the process of generating the data is not strictly stationary. In many cases, there is a need for

extracting some sort of knowledge from a continuous stream of data. Examples include twitter records and 24 hours CCTV streams. These sources are called data streams. Learning from data streams are incremental tasks that requires incremental algorithms that take drift into account.

Important properties of such algorithms are that they incrementally incorporate new data as it arrives and that they are able to cope with dynamic environments, deal with changes in the data-generating distribution -they must process examples in constant time and memory. The goal of this activity is to adapt sequential learning, anytime learning, real-time learning and online learning from data streams and related topics and to achieve processing in real-time.

The objective is to aid the user by providing additional information extracted from the city events. At technical level this activity focus on developing intelligent tools for extracting named entities, which could include name of a person, name of a place, a national monument etc. The tools developed in this task will exploit online resources such as Wikipedia, Geonames and DBPedia to provide the user with useful (or user preferred) statistics. In addition to the extraction of named entities, this task will also focus on semantic categorisation of the named entities by tagging named entities to semantic categories. This knowledge will be subsequently made available to image analysis tools as a pre-knowledge to aid in real-time image classification using kernel or biologically inspired algorithms. In addition, this task will also extract a list of “tobe experienced” aspects about a city and make available dynamic recommendations when the user is in the vicinity of these places. This task will closely collaborate with

A3.6 for interfacing with the user.

A3.5 - Real-time coding and streaming


A real-time multimedia stream from different physical or online sources is an important aspect of SERENDIPIT platform. The provisioning of multimedia services over heterogeneous environments like in SERENDIPITI is still a challenge, especially if the multimedia services have to be consumed anywhere and anytime and using any type of devices. Scalable media coding efficiently provides the adaptation of multimedia resources according to changing usage environments and to profiles information (user and terminal profiles). It contains the self-adaptable information rich media that can able to accommodate streaming and interaction functionalities.

This task embraces the three different areas of the work: fully scalable media coding; streaming of scalable media coding in heterogeneous environment; and Quality of

Experience of multimedia coding. In this context, the objectives of the task can be identified as:



The provision of a comprehensive evaluation report on the use of scalable media coding and multiple description coding techniques and state of the art as regards scalable media coding techniques.



Selection of the appropriate algorithms and mechanisms for scalable and adaptive media stream coding.



Definition of the framework for interworking between the application

(encoding/decoding) modules and the media transport layer, in order to make use of the adaptive media transport capabilities and flow control capabilities of the selected transport protocols.



Implementation of a scalable coding/decoding scheme to be used for real time transmission of media streams over congestion aware and adaptive flow control protocols.

A3.6 – Real-time Multimodal Data Synchronisation


This task will collaborate closely with previously mentioned tasks in aligning and synchronising information obtained from multiple modalities and multiple sources with respect to time and geo-spatial preferences. The outcome of this task will

(partially) reflect real-time dynamic updates on the events like traffic and crowd conditions and thereby suggesting alternative routes to the same location according to user preferences or user models.

WP3: Real-time Large-scale Data Analysis

Workpackage number 3 Start date or starting event:

Workpackage title: Real-time Large-scale Data Analysis

Activity type 1 RTD

Participant number

Participant short name

Person-months per participant:

M1

Objectives

WP3 aims at contributing towards building an integrated platform for semantic urban computing by developing tools for large-scale data analysis in real-time. The targeted tools implement low-level analysis algorithms that together with the outcomes of WP4 and WP5 will encompass the technology needed to realise the

SERENDIPITI software framework.

Description of work (broken down in different tasks + role of Partners)

To meet the needs of the above objectives, the following tasks have identified in the context of WP3:

A3.1 Distributed intelligent sensing

This activity is dedicated to gather and filter all information measured by a dense multimodal sensor network and available information extracted from on-line and social communities as blogs; twitter; Flickr; youtube. The sensor network covers a wide-range of modalities, like CCTV cameras, simple state/change indicators taking a binary (on/off) value, e.g. door open/shut in large indoor city scenarios, measurements of continues variables, e.g. temperature, noise level and complex sensors from metereologic stations when available.

A3.2 – Cartograph Analysis of a City

This task focuses on the analysis of city maps and generates an index of “interesting aspects” of a city.



A3.2.1 – Indexing cartograph



A3.2.2 – Mapping GPS from user to the Cartograph



A3.2.3 – Real-time tracking of user movement

A3.3 – Traffic and Crowd Monitoring from Real-Time Multimedia sources



A3.3.1 – Tracking and Tracing



A3.3.2 – Traffic monitoring for semantic events



A3.3.3 – Crowd behaviour monitoring



A3.3.4 – Real-time reasoning following a semantic event

1

Please indicate one activity per work package:

RTD = Research and technological development (including any activities to prepare for the dissemination and/or exploitation of project results, and coordination activities); DEM = Demonstration; MGT = Management of the consortium; OTHER = Other specific activities, if applicable in this call.



A3.3.5 – Interpretation of consequences of a semantic event

A3.4 – Sequential, anytime and on-line learning for Real-time semantic analysis

The main objective of this activity is to provide a coherent perspective on resource constrained algorithms that are fundamentally designed to handle limited bandwidth, limited computing and storage capabilities, limited battery power, and specific real-time network-communication protocols. In the targeted application scenario the process of generating the data is not strictly stationary. In this task, research will focus on developing intelligent tools for extracting named entities, which could include name of a person, name of a place, a national monument etc. The tools developed in this task will exploit online resources such as Wikipedia, Geonames and DBPedia to provide the user with useful (or user preferred) statistics.

A3.4 – Real-time coding and streaming

This task embraces the three different areas of the work:



Fully scalable media coding



Real-time streaming of scalable media coding



Quality of Experience of multimedia coding

A3.5 – Real-time Multimodal Data Synchronisation

This task will collaborate closely with previously mentioned tasks in aligning and synchronising information obtained from multiple modalities and multiple sources with respect to time and geo-spatial preferences. The outcome of this task will (partially) reflect real-time dynamic updates on the events like traffic and crowd conditions and thereby suggesting alternative routes to the same location according to user preferences or user models.

Deliverables (brief description ) + month of delivery

D3.1 – Real-time analysis algorithms for Cross-Media Analysis and Annotation (M18)

D3.2 – State-of-the-art report on current multimodal techniques (M12)

D3.3 – Evaluation of real-time scalable and multiple description coding in heterogeneous environment (M24)

D3.4 – Report on multimodal data synchronisation (M30)

WP3: Real-time Large-scale Data Analysis

WP3: Real-time Large-scale Data Analysis

WP3: Real-time Large-scale Data Analysis

Related documents

Products

Support

WP3: Real-time Large-scale Data Analysis