Sensor Grids For Air Pollution Monitoring M. Richards Imperial College London

advertisement
Sensor Grids For Air Pollution Monitoring
M. Ghanem, Y. Guo, J. Hassard, M. Osmond, and M. Richards
Imperial College London
180 Queens Gate, London, SW7 2BW
{m.ghanem, y.guo, j.hassard, m.osmond@, mark.richards}@imperial.ac.uk
Abstract
In this paper we describe the use of sensor grids within Discovery Net to construct a distributed
system for urban air pollution monitoring and control. We present the background to urban air
pollution monitoring and describe the high throughput sensors developed within this project to
address the problem. We differentiate between the concepts of sensor networks and sensor grids and
discuss the main challenges that arise when building sensor grids. We present a solution to address
these challenges based on the integration of distributed sensors, grid technologies, data integration,
data mining and GIS systems. We also present a case study for examining the effectiveness of
visual and automated methods developed for the analysis of generated data sets.
1. Introduction and Overview
As many cities around the world become more
congested, concerns increase over the level of
urban air pollution being generated and in
particular its impact on localised human health
effects such as asthma or bronchitis. The more
this relationship is understood, the better chance
there is of controlling and ultimately minimising
such effects. In the majority of the developed
world, legislation has already been introduced
to the extent that local authorities are required
by law to conduct regular Local Air Quality
Reviews of key urban pollutants such as
Benzene, SO2, NOx or Ozone - produced by
industrial activity and/or road transport [2]. In
order to achieve this, pollutant concentrations
must be monitored accurately and ideally in situ
so that sources may be identified quickly and
the atmospheric dynamics of the process are
understood. Furthermore, such data would lend
itself to real-time environmental decisionmaking capabilities as a result of hazardous
levels being rapidly identified.
The Discovery Net project [1] is developing
grid-based methods for the integration and
analysis of data generated from distributed high
throughput devices in a variety of application
areas including life science, environmental
science and remote sensing. The goal is to
develop an advanced generic computing
infrastructure that supports real-time processing,
interpretation, integration, visualisation and
mining of massive amounts of time-critical data
generated from such devices. One of the main
application areas of the project is the analysis of
data generated by the GUSTO high throughput
pollution monitoring sensors (see section 3).
Deploying a sensor grid over a target region,
such as a heavy industrialised, or densely
populated area, creates a wealth of data
allowing new types of analysis to be conducted.
These include the analysis and visualisation of
the spatiotemporal variation of multiple
pollutants in respect to one another, and their
correlation with third-party data, such as
weather, health or traffic data. Such analysis can
provide valuable clues as to how local health
effects (e.g. aggravated respiratory illnesses)
occur. However, modern sensor technologies,
e.g. GUSTO, which measure pollutants at a high
level of accuracy can generate up to 8 GB data
each day per sensor. This raises many
informatics challenges with respect to managing
and analysing the collected data.
In the remainder of this paper, we describe the
motivations for the development of the high
throughput GUSTO sensors within the project
as well as the sensors themselves. We also
discuss the main informatics challenges that
arise when a high throughput sensor grid is
constructed based on such sensors, present a
snapshot of the infrastructure developed to
address these problems and discuss the various
data visualisation and analysis scenarios for
which the platform has been designed.
2. Pollutant Sources and Health
Effects
Human beings breathe in and out approximately
once every four seconds, which equates to over
eight million times a year. As a consequence
our lungs process around four million litres
(4,000m3) of air from the earth’s atmosphere,
every year. Urban air pollution is therefore one
of the most important environmental issues that
may be considered due to its direct effect on
human health.
It mainly results from
anthropogenic (human) activities and has
diverse causes and sources.
“Stationary
sources,” such as factories, power plants, and
smelters; “area sources,” which are smaller
sources such as dry cleaners and degreasing
operations; “mobile sources,” such as cars,
buses, planes, trucks, and trains; and “natural
sources,” such as windblown dust and wildfires,
all contribute to air pollution.
Due to the trans-boundary nature of airborne
pollutants, it is difficult for any single
organisation to take responsibility for overall
emission levels. Thus, the control of air
pollution is entirely legislation driven. As such
the passing of new legislation may only be
effective if the specified compounds can be
monitored accurately.
The primary airborne pollutants covered by
European legislation are: SO2, NOx (NO/NO2),
benzene, Ozone, CO/CO2, and particulate
matter (PM10/PM2.5) [2]. Currently GUSTO
sensors are optimised to monitor SO2, NO, NO2
Benzene and Ozone – primarily due to the fact
that all of these compounds have measurable
absorption signatures in a fairly narrow part of
the UV spectrum, (see section 3.1). However,
further optimisation of the sensor is anticipated
(with the addition of infrared capability) that
will lead to the inclusion of CO/CO2 and
particulate matter, thus covering the whole
suite. A summary of the key pollutants (covered
by GUSTO), likely sources and resulting health
effects are summarised in Table 1, and
discussed in more detail below.
Sulphur Dioxide (SO2): SO2 is prevalent in
most industrial raw materials, including crude
oil, coal, and common ores like aluminium,
copper, zinc, lead, and iron. Sulphur gases are
produced when fuel, such as oil and especially
coal, is burnt, during mining and industrial
processes e.g. when petrol is extracted from
crude oil and naturally from volcanic eruptions.
Pollutant
Sulphur Dioxide
Benzene
Nitric Oxide
Nitrogen Dioxide
Ozone
Symbol Source
Petroleum Refinaries/Coal
SO2
PowerdPower Stations
Transport/Industry unburned
C6H6
fuel products
High temperature combustion
processess / road transport
NO
NO2
O3
Sulphur dioxide is a natural component of the
earth’s atmosphere with natural emissions
accounting for around 50-70 million tons per
year, total anthropogenic emissions far exceed
this however at between 150 and 200 million
tons per year
Health effects of SO2 gas are irritation to the
eyes and respiratory system, reduced pulmonary
functions and aggravation to respiratory
diseases such as asthma, chronic bronchitis and
emphysema.
Exposure to extremely high
concentrations will cause permanent damage to
the respiratory system as well as extreme
irritation to the eyes (due to production of dilute
sulphuric acid around the eyes). When SO2
reacts with other chemicals in the air to form
tiny sulphate particles, these may also be
inhaled in which case they gather in the lungs
and are associated with increased respiratory
symptoms and disease, difficulty in breathing,
and premature death [3].
Benzene (C6H6): Benzene is the most common
of a group of compounds referred to as Volatile
Organic Compounds (VOCs). Benzene is a
minor constituent of petrol (EC legislation states
that it must be less than 5% by volume, average
content in UK petrol is about 2% by volume
[5]). Generally, VOCs are produced as fuel byproducts in a combustion process.
Benzene is a known carcinogen, however the
main health hazard arises from its role in the
production of ground level ozone.
Ozone (O3): Ozone (O3) is a colourless gas
formed at ground level by reactions involving
VOCs and nitrogen oxides. There are no
terrestrial sources of ozone, however any that is
formed, will also be destroyed, assuming that
the VOCs (or other compounds that shift the
balance of the reaction toward high ozone
levels) are no longer present. Thus the levels of
tropospheric ozone will fall only when the
heat/sunlight required is not present or the
VOCs have broken down. Ground level ozone
can be transported great distances by prevailing
Health Effects
Irritation to eyes and respitory system.
Reduced pulmonary functions.
Known carcinogen. Also plays role in
formation of ground level ozone
Can increase incidences of acute
respiratory disease in children
High temperature combustion Irritation to lungs and lowered resistance
processess / road transport
to respitory infections such as influenza.
Ground level reactions
involving Nox and VOCs
Respitory infection, lung inflamation,
aggravation of asthma
Table 1: Summary of Pollutants and Health Effects
Legal limit
(ppbv)
Averaging time
100 15 min
Running annual
5 mean
16 Annual mean
105 1 hour mean
Running 8 hour
50 mean
winds [3].
Short-term exposures (1-3 hours) to moderate
ozone concentrations have been linked to
increased hospital admissions for respiratory
complaints. Repeated exposures are linked with
increased susceptibility to respiratory infection,
lung inflammation and aggravation of preexisting respiratory diseases such as asthma.
Other health effects of exposure to ozone are
decreases in lung function and increased
respiratory symptoms such as chest pains and
coughing [3]. All of the symptoms of exposure
to ozone appear aggravated by periods of
moderate exertion and children active outdoors
are the group at greatest risk of developing
symptoms during levels of high ozone
concentration. In addition, long-term exposures
to ozone present the possibility of permanent
changes in lung function, which could lead to
premature ageing of the lungs and/or chronic
respiratory illnesses [3].
Nitrogen Oxides (NOx): Nitrogen Oxides or
NOx (NO, NO2 and NO3) are a group of highly
reactive gases containing nitrogen and oxygen.
Many nitrogen oxides are colourless and
odourless. However, nitrogen dioxide is a
brownish gas with a strong odour. Natural
background levels of NOx (in this case NO and
NO2) within rural UK districts are between 1 to
4 ppb [4]. Urban areas of the UK have roughly
averaged concentrations of NO2 of between
around 20ppb since 1976 [4]. The generally
accepted reason for the apparent lack of change
in concentrations is that while there has been a
reduction in nitrogen dioxide emissions from
industrial sources, there has been a rise in
emissions from road transport [4].
It is known that exposure to high concentrations
over short periods of time is more harmful to
health than long time exposure to lower
concentrations [6].
However legislative
directives are based on running mean average
concentrations of at least 15minutes.
• Short time scale (of order 2s scan rate)
• Open variable path (up to 30m), enabling
measurements to be carried out in situ and
localised effects to be characterised.
3.1 Sensor Theory
The volume mixing ratios of certain trace
atmospheric gases may be determined using
differential optical absorption spectroscopy
(DOAS). This is a well-known method of
retrieval
and
has
been
documented
comprehensively by a number of authors (see
[10] for example).
The custom developed
DUVAS method makes use of the characteristic
narrow band absorption of the gas under study
in the UV spectral range 200-270nm. These
include SO2, NO, NO2, O3, NH3 and Benzene
(all are governed by strict legislative guidelines
with respect to acceptable limits of ambient
concentration).
The Beer-Lambert law describes the absorption
over a path length x (m) of photons by a gas
with number density n (m-3) and absorption
cross-section σ(λ) (m2) and is usually written
as:
I (λ ) = I 0 (λ ) F (λ ) exp[−σ (λ )nx − α (λ )nx]
where
I (λ ) = Measured Intensity(Wm− 2 ),
I 0 (λ ) = Lamp Intensity (Wm− 2 ),
F (λ ) = Wavelength dependenceof instrument,
σ (λ ) = Absorption cross section (m2 ),
n = Number density (m− 3 ),
x = Pathlength (m),
α = Scattering cross section (m2 ).
If we consider the resulting intensity spectrum
to be comprised of broad (IB) and narrow (dI)
features, then we may write (dropping the λ),
I = I B ⋅ dI
3. GUSTO – An Open Path Air
Pollution Sensor
GUSTO is an acronym for Generic Ultraviolet
Sensors Technologies and Observations based
on
open-path
DUVASTM
(Differential
Ultraviolet
Absorption
Spectroscopy)
technology and measures and transmits the
volume mixing ratios (at ppb levels) of key
urban pollutants in real-time. The key
distinguishing features are:
The narrow features (arising from trace
molecular absorption) are then de-convolved
from the spectrum and the resulting differentials
are used to calculate the concentration (number
density) of each absorber.
3.2 The Sensor Hardware
A general schematic of a GUSTO sensor is
shown in Fig. 1. The optical configuration of
the GUSTO system is comprised primarily of
four components; these are the lamp (with its
associated optical components), fibre optic
probe, Spectrometer, and linear CCD detector.
Briefly, light from the deuterium lamp is retroreflected back towards the source via the
focussing optics.
The returning light is
collected using a fibre optic ’light pipe’ coupled
to a spectrometer unit. The spectral output is
then imaged onto the surface of the CCD
detector and intensity values are obtained via a
12-bit ADC to produce an atmospheric
spectrum of diode number versus ADC counts
(equivalent to wavelength versus intensity) over
the GUSTO range. At this point a further layer
of analysis (the DUVAS retrieval algorithm) is
performed on the spectrum in order to
‘disentangle’ the multiple absorbing species and
obtain separate mixing ratios for each pollutant
simultaneously.
Fig. 1. Schematic of GUSTO Sensor
A summary of the retrieval process is shown
below (see Fig. 2), and it is worth noting that
the entire process is extremely rapid and takes
only a fraction of a second - allowing for rapid
retrieval updates.
This aspect becomes
important
when
retrieving
pollutant
concentrations in a rapidly changing dynamic
atmosphere.
Simulated
spectrum
Actual
measurement
A1. Spectrum measured with
instrument
A2. Diode response
calibrated out of signal
S1. Cross sections read from
file
S2. Absorption for given
number density and path length
calculated
S3. Spectrum convolved with
instrument function
A3. DUVAS applied,
differential obtained
S4. DUVAS applied,
differential obtained
A4. Spectrum biased above
axis and peak areas
calculated
S5. Spectrum biased above axis
and peak areas calculated
6. Fit synthetic to real, retrieve
concentration and error.
Fig. 2 The DUVAS retrieval process
4. Sensor Networks vs. Sensor Grids
In this paper, we make a distinction between
“Sensor Networks” and “Sensor Grids”.
Whereas the design of a sensor network
addresses the logical and physical connectivity
of the sensors, the focus of constructing a sensor
grid is on the issues relating to the data
management,
computation
management,
information management and knowledge
discovery management associated with the
sensors and the data they generate, and how
they can be addressed within an open
computing environment. To highlight the
difference, we summarise the main issues
relating to a sensor grid environment as follows:
Distributed Sensor Data Access and
Integration: The first issue relates both to the
heterogeneity and geographic distribution of the
sensors within a sensor grid and how sensors
can be located, accessed and integrated within a
particular study. Not only is it essential to
record the type of pollutants measured (e.g.
Benzene, SO2, NOx, etc) for each sensor, but
also since sensors may be mobile it is essential
to record the location of the sensor at each
measurement time. Such information must be
described and published using standardised
techniques allowing the security and
authentication issues relating to accessing and
controlling the sensors to be addressed.
Large Data Set Storage and Management:
The second issue relates to the sizes of data
being collected and analysed. For example, each
GUSTO sensor generates in excess of 8 GB of
data each day. Online monitoring of data may
not imply that the data sets generated must be
stored, however, most analysis studies proceed
by analysing historical data, in which case all
collected data must be cleaned, processed and
warehoused for later use.
Distributed Reference Data Access and
Integration: The third issue relates to the
integrated analysis of the sensor data. Whereas
the analysis of spatiotemporal variation of
multiple pollutants in respect to one another can
be directly achieved over archived data, more
often it is their correlation with third-party data,
such as weather, health or traffic data that is
more important. Such third-party data sets (if
available) typically reside on remote databases,
and are stored in a variety of formats. Hence,
the use of standardised and dynamic data access
and integration techniques to access and
integrate such data is essential.
GUSTO
unit 1
Wireless
connectivity
Monitoring and
control software
Sensor registry &
control service
GUSTO
unit 2
HTTP,
SOAP,
GSI
SensorML
GUSTO
unit 3
HTTP,
SOAP,
GSI
Data upload
service
Data access
service
Warehouse
Archived
weather data
GUSTO
unit 4
Archived
health data
Public access
Web visualizer
Visualisation and
Data Mining
(Discovery Net)
GRID Infrastructure
Fig. 3. Discovery Net’s Sensor Grid Architecture
Intensive and Open
Data Analysis
Computation: Finally, the fourth issue relates
to the analysis components applied to the data.
True integrated analysis of the collected data
requires a multitude of analysis components
such as statistical, clustering, visualisation and
data classification tools. The choice of which
data sets and analysis components to use is
typically governed by end user requirements,
and such users vary from city planners and local
government,
to
health
practitioners,
environmental organizations and academic
researchers. It thus becomes essential to allow
the users to locate, access and integrate third
party data analysis components within their own
analysis workflows. Furthermore, if the analysis
is to proceed over large data sets, it is essential
to provide access to high performance
computing resources to allow rapid computation
to proceed.
3.1 Sensors Grids in Discovery Net
Discovery Net’s architecture for such a sensor
Grid is shown in Fig. 3. Our methodology for
addressing the data integration requirements of
air pollution monitoring is based on extending
Discovery Net’s use of grid services to
encompass high throughput sensors. The
capabilities of each sensor can be published in a
registry using standardised methods (such as the
sensorML Markup language [7] from the Open
GIS consortium) allowing the sensor’s data as
well as metadata describing the sensor
properties to be accessed and retrieved using
standardised protocols.
Each GUSTO unit contains a computer which
analyses the sensor readings, generating a
measurement of concentration for each pollutant
every 2 seconds. This data is uploaded at
intervals to a remote Grid service, which
manages the centralised storage of data in a
warehouse accessible using SQL databases,
Oracle databases and the OGSA-DAI grid
standard. In parallel, the sensor network may be
monitored and controlled using similar
technology. Security and identification of
sensors may be managed using the grid security
infrastructure GSI.
3.2 Supporting Decision Analytics in a Sensor
Grid
A GUSTO sensor grid would be extremely
valuable in the area of pollution monitoring due
to the high density of sensors (several tens or
hundreds of sensors over a few square
kilometres, rather than one or two sensors per
city), and the fine temporal resolution of the
pollutant concentration readings (every two
seconds, rather than 15-min or hourly averages).
This wealth of data allows detailed examination
of the area being monitored, down to the level
of streets and buildings, and the ability to detect
short-duration peaks in pollution is important
due to non-linear effects of pollution on health
[6].
Within the Discovery Net sensor grid, the data
integration and transformation tools provided
are critical for this kind of distributed analysis.
The InfoGrid [8] component provides a method
of querying and combining data from multiple
heterogeneous sources. The Discovery Net
service workflow model [9] allows end users to
construct analysis models as a composition of
the execution of mixed local and remote
analytical components. Such end users construct
their workflows using visual workflow
authoring tools allowing them to browse and
search for analytical components and then
connect icons representing them as a data flow
graph that represents the computation. These
workflows are then submitted to the Discovery
Net execution engine that handles the
scheduling of the execution of the components
on different machines. The Discovery Net
workflow model provides the end user with
high level tools that shield them from the
complexities of the underlying Grid computing
architecture, and that presents them with an
easy-to-use higher-level end user interface.
4.
Data Analysis Scenario
In this section we present a case study to
evaluate the effectiveness Discovery Net’s data
preprocessing, data mining and data analysis
components in the analysis of air pollution data.
This evaluation is based on simulated data that
would be generated from a realistic scenario of
constructing a sensor grid over a typical urban
area. The chosen area is shown in the map (Fig.
4) around Tower Hamlets and Bromley areas in
East London. It is worthwhile highlighting some
of the landmarks in the urban area shown in the
map, these are: Main road extending between
map location (A6 and L11), Hospital (B5),
School (C6) and Gas Works (E1 and D2).
The simulated scenario is based on a
distribution of 140 sensors in the area collecting
data over a typical day from 8:00 am till 6:00pm
at two-second intervals to monitor for NOx and
SO2. The simulation of the required data has
taken into account known atmospheric trends
and the likely traffic impact.
The simulation data provides us with enough
information to develop visual and automated
data analysis components and composition
Fig. 4: GUSTO sensors in an
area of East London
Fig. 7: Scatter plots of pollution
profile over 10 hours
workflows that can be used in identifying
pollution trends. The relatively high spatial
density of sensors used also allows a detailed
map of pollution in both space and time to be
built up. Within a real-case scenario, we can
then use these same components and workflows
to help end users in assessing whether real
observed pollution trends could be related to
observed health effects.
4.1 Data Pre-processing
Once sensor data is collected, data cleaning and
pre-processing is necessary before further
analysis and visualisation can be performed.
Most importantly, missing data must be clearly
marked or interpolated. Interpolation can be
performed using bounding data from the sensor,
or also using data from nearby sensors at the
same time. Interpolated data may be stored back
to the original database, with provenance
information including the algorithm used. Such
pre-processing is standard, and has been
conducted using the available Discovery Net
components. We omit its details from this paper
due to space limitation.
4.2 Visual PollutionTrend Analysis
The first step in analysing the collected data is
through the use of visual analysis tools. The aim
here is to provide end users with a tool allowing
them to monitor how pollution builds over time
and space, and also to provide them with tools
that enable them to interpret the reasons for
different pollution profiles.
To support such analysis, the Discovery Net
GIS map viewer was extensively redeveloped
for this project to support the different types of
visualisation requested by the applications team.
One of these visualisation techniques is shown
in Fig. 5 where a continuous colour map of
Fig. 5: Interpolated colour
pollution representation over
vector map layers
Fig. 8: Hierarchical
Workflow
Clustering
Fig. 6: Colour interpolation
and bar charts for each sensor,
displayed on an aerial photo.
Fig. 9: Dendrogram of the
Clustering Model.
pollution can be used to dynamically interpolate
values between any number of arbitrarily placed
sensors. The viewer makes use of vector maps
allowing information such as building names
and types to be stored and queried by the user.
how such algorithms can be used to detect
groups of sensors that measure similar pollution
profiles automatically. The Discovery Net data
analysis workflow used for the operation is
shown in Fig. 8.
Fig. 6 shows how bar charts (or pie charts)
within the viewer can be used to represent the
amount and proportion of different pollutants
present for each sensor. For easier and more
flexible interpretation by end users, background
images (with proper positioning information)
may also be added as layers. In addition to
providing data snaps shots, all visualizers have
been extended to support studying the temporal
aspect of the data sets using an animation
system, allowing the user to guide their
exploration of the data records. Averaged data
can be used to give a quick and overall view of
how the pollution changes with time, while still
permitting the user to drill down to see finer
detail pollution events.
The generated clusters are shown in Fig. 9. The
red heat map shows the pollution level across
the time studied, from 8am on the left to 6pm on
the right. The topmost band, for example, shows
high pollution at morning and afternoon rush
hour and contains a cluster of sensors on main
roads. By mapping the clusters back to the GIS
viewer, one can see that three different groups
of clusters are clearly identified: The first group
represents the sensors along the main road (A6L11). The second group represents those that
are near the GAS Works (E1 & D2) and the
third group are those near the school (C6).
Fig. 7 shows how scatter charts are used to
examine the pollution profile produced by
single and multiple sensors over time. The
display can be used to identify visually groups
of sensors that show a similar pollution profile,
and also to study temporal correlations between
different pollutants. The scatter chart interacts
with the map viewer allowing the end user to
identify what map features may be causing the
similarity or correlation in pollution profiles.
The advantage of scatter charts is that they can
show features which may not be obvious from
examining the animation – for example, in
addition to the expected peaks at morning and
afternoon rush hours (Main Road), there is also
a peak in mid-afternoon which corresponds to
school closing time (location C6), and locating
the sensors contributing to this peak shows that
they are indeed near schools. Similarly the
pollution patterns around the Gas works (E1 and
D2) builds up at similar times.
Reisinger and Fraser [10] describe a differential
optical absorption spectroscopy (DOAS)-based
instrument for measuring the pollutants NO,
NO2, SO2, and O3. Their sensors have a longer
optical path (100 m - 20 km) than currently used
by GUSTO sensors, and with similar detection
accuracy, but coarser time resolution (20-minute
readings).
4.3 Automatic Pollution Trend Analysis
Visualization techniques provide an effective
way for an end-user expert to monitor sensor
data in real-time, and also to explore limited
sets of historical data. However, the analysis of
trends within large data sets collected over
longer periods of time can benefit from the use
of automatic analysis methods and algorithms.
As an example of using and developing
automated analysis methods we have used a
method based on Hierarchical Data Clustering
to study correlations between the data measured
by different sensors. The aim was to evaluate
5. Related Work
The “Air Pollution in the Streets” project [11]
measures pollutant concentrations at ppm levels,
along with ambient conditions such as
temperature, humidity and wind speed, at a
spatial resolution permitting examination of
pollution in different streets. The described
configuration provides 6-minute averages of
one-minute samples, for each of 6 monitored
variables. The work emphasises that the
pollution concentration in nearby streets can
vary greatly due to spatial configuration of
buildings in the area as well as traffic levels,
confirming the value of high spatial resolution
in pollution monitoring. In [12] the authors
describe how grid technologies may be used for
collecting data from mobile sensors and it to
study the effects of pollution build-up.
Other projects have investigated the knowledge
discovery aspects of analyzing data collected
from sensor networks. For example, Li et al [13]
investigate a method of analysing hourly
monitoring data produced by 71 sensors
distributed over Taiwan. Their analysis was
performed using multi-scale wavelet transforms
and self-organising map (SOM) neural
networks, examining the spatiotemporal data to
find sensor clusters. The aims of the analysis are
similar to ours, but the openness of the
Discovery Net system allows us to investigate
the use of a wider variety of data analysis
components. The Time Map project [14] has
developed data analysis software with many
similar features to those developed for the GIS
components of Discovery Net. The software
allows visualisation of distributed and
heterogeneous spatiotemporal data sets with
GIS integration, animation, and interactive
maps. Server-side data management, retrieval
and filtering allows for a lightweight client
applet interface.
In terms of data integration, the APPETISE
project [15] aims to produce a shared database
containing pollution and related data, such as
traffic statistics and weather records, and to
develop tools for analysing, mining and
visualising this data. Crabbe et al [16]
investigate the use of telemedicine methods to
study the correlation between urban pollution
and asthma These aims are clearly very similar
to those of the GUSTO project, the main
difference being the lower resolution of data
collected compared to the GUSTO sensors.
6. Conclusions and Discussion
In this paper we have provided an overview of
the urban air pollution monitoring application
within Discovery Net, describing the GUSTO
sensor technology, the sensor grid technology
and the grid-based knowledge discovery
framework used.
Our work is enabled by the GUSTO sensor
technology itself that measures pollutants
accurately at pbb (part per billion levels) at very
short intervals (~2 seconds). Such throughput is
higher than that of other projects with similar
aims. The distribution of the sensors, the large
volumes of the data collected, the data
integration requirements and the requirements
for using different analytical components at
various stages have clearly made the use of Grid
technologies essential for our application.
We are currently extending the application case
studies to understand further the infrastructure
and knowledge discovery requirements of
constructing large-scale sensor grids and
gearing up towards deploying a real sensor grid.
We are also currently investigating further
knowledge discovery techniques for correlating
pollution trends with other data sources, such as
traffic data and health data.
Our experience has shown that strong
interdisciplinary collaboration with end-user
input from the outset can result in development
of high quality informatics tools. For this
project there is a clear path towards real-time
decision-making in an urban environment that
could ultimately improve the quality of life.
References
[1] V. Curcin et al. Discovery Net: Towards a Grid of
Knowledge Discovery. In Proceedings of KDD-2002.
The 8th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining. 2002.
[2] UK, Department of Trade and Industry. “The
Energy Report 2000”. HMSO. 2000
[3] U.S. Environmental Protection Agency (2004).
www.epa.gov
[4]
UK
Environment
Agency
(2004).
http://www.environment-agency.gov.uk
[5] UK Department for Environment Food and Rural
Affairs. (2004) www.defra.gov.uk
[6] The World Bank. "Dose-Response Functions and
the Health Impacts of Air Pollution", Environment
Dissemination Notes, 1994.
[7] The Sensor Model Language. (2004)
vast.uah.edu/SensorML/
[8] N. Giannadakis et al. InfoGrid: Providing
Information Integration for Knowledge Discovery.
Information Science, 2003: 3:199-226.
[9] S. AlSairafi et al. The Design of Discovery Net:
Towards Open Grid Services for Knowledge
Discovery. International Journal on High
Performance Computing Applications, Vol. 17 Issue
3. 2003
[10] Reisinger et al. Slow-scanning DOAS system
for urban air pollution monitoring. Proceedings of the
XVIII Quadrennial Ozone Symposium, Sept. 1996.
[11] Croxford et al. Spatial Distribution of urban
pollution: civilizing urban traffic, 5th Symposium on
Highway and Urban Pollution. Copenhagen,
Denmark, 1995.
[12] A. Steed et al. e-Science in the Streets: Urban
Pollution Monitoring. Proceedings of the UK eScience AHM Conference, Nottingham, UK, 2003.
[13] Sheng-Tun Li, Shih-Wei Chou, Jeng-Jong Pan.
Multi-Resolution Spatiotemporal Data Mining for the
Study of Air Pollutant Regionalization. Proceedings
of the 33rd Hawaiian International Conference on
System Sciences. 2000.
[14] Ian Johnson and Andrew Wilson. The TimeMap
Project: Developing Time-Based GIS Display for
Cultural Data. Journal of GIS in Archaeology Vol 1.
(2002) ESRI Inc., Redlands.
[15] APPETISE (2004). ww.uea.ac.uk/env/appetise/
[16] H. Crabbe et al. The Use of a European
Telemedicine System to Examine the Effects
ofPollutants and Allergens on Asthmatic Respiratory
Health. The 7th International Urban and Highway
Pollution Symposium. Barcelona, Spain. 2002.
Download