Environmental Monitoring Data Challenges Jeremy Cohen Imperial College London Data Intensive Research Meeting Monday 22nd November Edinburgh, UK Introduction! • Mobile detection of environmental pollution! • Focus on local air pollutants:! • Nitrogen Oxides (NOx, NO2), Sulphur Dioxide (SO2), Ozone (O3), Carbon Monoxide (CO)! • VOCs (particularly Benzene (C6H6))! Also consider other local environmental factors, e.g. noise, humidity, temperature.! The MESSAGE Project! • Mobile Environmental Sensing System Across a Grid Environment! • 3 year project starting October 2006! • Funded jointly by EPSRC and DfT (~£4m), under EPSRCʼs e-Science demonstration programme! • 5 Universities, 19 industrial partners! • Pioneering combination and extension of leading edge computing, sensor, communication and positioning technologies! • Create radically new sensing infrastructure based on combination of ad-hoc mobile and fixed sensors! • www.message-project.org! !"##$%"&'&()*+,*)-& ./0,-&,1/&23,4156&17&1(8/)+1-9:&035&5/,-78)/5&/;4,5;<&8)443*)-&=>)5&78)57?&@144& /;(,1-&,&A;6&8/)04;(& B)/&;C,(84;:&DEF&1-&G)-<)-H&@144&@;&7;;&5>;7;&1(8/)+;(;-57&06&FIJIK& !""#& Ref: London Atmospheric Emissions Inventory – (2006)! !"$"& !"##$%"&'&()*+,*)-& L6&;C17*-9&75,-<,/<7&M;-5/,4&G)-<)-&17&/;4,*+;46&@;44&&()-15)/;<N&OPI& QC;<&$R!&715;7S&& L35&5>17&7*44&4;,+;7&>39;&9,87T& U;7;,/M>&,-<&8)41M6&<;+;4)8(;-5&,/;&>,(8;/;<& 06&,&4,MA&)V&<,5,&)V&73WM1;-5&78,*,4&,-<& K&5;(8)/,4&9/,-34,/156& Sensor Devices! • Data captured and made available by a range of sensor devices! • Static and mobile! • Different pollutants captured! • Sample rates! • Different communication capabilities! • Computational power – sensors produce data according to their computational capabilities! Data Lifecycle: MESSAGE e-Science Architecture! Data Lifecycle: Capture! • Data pre-processing! • On sensor devices where computational power available! • At gateway nodes within the network! • Distributed data mining! • QA! • Identification of potentially erroneous values! • Periodic sensor calibration – drift compensation applied at DB! Data Lifecycle: Storage! • UTMC-based data store to support range of applications! • Interpolation! • Advance preparation of statistical data! • Outlier detection and interpretation! • Assembly of app-specific data marts! • Long-term warehousing of out-of-date data! • (How) Do we archive everything?! Database infrastructure! SQL! Query! XML! CSV! KML! OGSA-DQP! Controller! • Flexible interface for data insertion! • Single interface transparent access to data across multiple databases! OGSA-DAI! Instance 1! OGSA-DAI Instance 2! OGSA-DAI Instance 3! Data Store 1 Data Store 2 Data Store 3 • Uses OGSA-DAI (www.ogsadai.org.uk) a partner in OMII-UK! • Variety of data extraction formats! • Additional output formats may be added! Data from sensors! Data Lifecycle: Analysis, Processing and Visualisation! • Range of analytical processes that scientists may want to carry out! • Look at temporal and spatial variations of pollutant concentrations (e.g. hotspot detection)! • Relationship between different pollutant species! • Correlation with other factors – e.g. traffic levels, weather, health impacts! • System management and calibration – sensor control, fault detection, etc.! Data Lifecycle: Analysis, Processing and Visualisation! • Real-time vs. historic/predictive analysis! • Resource intensive since performance critical! • How many clients do we need to support?! • 3rd party providers may consume and “resell”! • Aim for an interface/API that allows the scientist to plug in their “application”! Data Lifecycle: Analysis, Processing and Visualisation! OGSA-DAI Web Query Example! KML Data! CSV Data! Data Lifecycle: Analysis, Processing and Visualisation! Data Lifecycle: Analysis, Processing and Visualisation! • Clicking a sensor provides statistical information for that sensor! • Level meter provides shows average pollution level for each species! • Selected sensors identified by coloured ring! • Selected sensors display a trace of recent readings in the data stream history window! Challenges! • Number of sensors! • Limited in our trials – up to ~40 sensors live at once! • Potentially many thousands+! • Significant variance in number of active sensors! • Data volumes! • Potentially very large – e.g. 0.5Mb per sensor/hour at 1Hz! • We have some control – e.g. dynamic variation of sample rates! Challenges! • What do potential users want?! • Access to raw / pre-processed data streams?! • Access to services? ! • Access to user-friendly interface?! • Measures of success! • Extensive follow-up work building on elements of this work! • Application in different domains – e.g. fleet management! THANK YOU! jeremy.cohen@imperial.ac.uk! www.message-project.org! With thanks to MESSAGE project sponsors and colleagues at Imperial College London, University of Cambridge, University of Newcastle, University of Leeds and University of Southampton who worked on the material shown in this presentation.!