e-Science from the Antarctic to the GRID

advertisement
e-Science from the Antarctic to the GRID
Steve Benford a, Neil Crout c, John Crowe b, Stefan Egglestone a, Malcom Foster a ,b, Chris
Greenhalgh a, Alastair Hampshire a, Barrie Hayes-Gill b, Jan Humble a, Alex Irune a, Johanna
Laybourn-Parry c, Ben Palethorpe b, Timothy Reid c, Mark Sumner b
a.
b.
School of Computer Science & IT, University of Nottingham
School of Electrical and Electronic Engineering, University of Nottingham
c.
School of Life and Environmental Sciences, University of Nottingham
Abstract
Monitoring life-processes in a frozen lake in the
Antarctic raises many practical challenges. To
supplement manual monitoring we have designed,
built and successfully deployed a remote
monitoring device on one of the lakes of interest.
This returns data to the Antarctic base over the
Iridium satellite phone network. This provides us
with a new and uniquely detailed view of the lifeprocesses in that environment, and is allowing us
to understand that environment in new ways, for
example exploring diurnal effects, and detailed
energy flow models. We have integrated this
sensing device into a common Grid-based
software infrastructure; this makes the device and
its sensors visible on the Grid as services, and also
maintains an archive of sensor measurements. A
desktop user interface allows non-programmers to
work with this data in a flexible way. The
experience of creating and deploying this device
has given us a rich view of the many elements and
processes that must be brought together to make
possible this kind of e-Science.
1.
• They may be sensitive to global changes,
perhaps acting as a kind of “early warning”
of climate change and its impact.
Historically, the process of obtaining data has
been highly labour-intensive and potentially
hazardous, with scientists making journeys of tens
of kilometres from the Antarctic base to collect a
handful of readings at a location of interest (see
figure 1).
Introduction
Professor Laybourn-Parry and her colleagues have
been studying the ecology of freshwater Antarctic
lakes for 12 years, and in particular the cycling of
carbon through the ecosystem. These lakes are
scientifically important for a number of reasons:
• They are isolated, pristine ecosystems with
no direct human impact.
• They are ice-covered for much or all of the
year, which reduces mixing with external
materials.
• Few species of plant or animal are present in
the lake, and the food web is consequently
simpler to analyse and model.
• They are harsh environments, that force the
evolution of interesting survival adaptations
in planktonic organisms.
• They are fragile ecosystems.
Fig. 1. Typical images of the manual datacollection process
Consequently, the available data has been quite
limited; for example gathering data is subject to
availability of personnel, transport (such as
helicopters and quad-bikes), the stability of the ice
and suitable weather in which to travel and work,
and is restricted to the daytime only. As a result,
the existing data sets are very sparse, e.g. a set of
readings every one to two weeks. The sparseness
of the data in turn limits the detail and accuracy at
which colleagues back in Nottingham can model
and analyse the life-processes in those
environments; some of the phenomena of interest
occur at time-scales significantly shorter than the
available data can address.
This paper describes work that has been carried
out between March 2002 and August 2003. This
work combines wireless devices and sensors, Grid
technologies and desktop visualisation tools to
address the challenge of supporting – and
enhancing – the Antarctic science outlined above:
taking e-Science from the Antarctic, through the
Grid, and onto the desktop.
The remainder of this paper is structured as
follows. Section 2 describes the complete system
– hardware, networking and software – that we
have designed, built and deployed. Section 3
presents some of the new results that have already
been gathered using this system. Section 4
discusses some of the many e-Science-related
issues that have already been raised by this work.
Finally, section 5 identifies areas of ongoing and
possible future work.
2.
analysing
it,
and
increasing
their
understanding of what is happening in the
Antarctic lake and its ecosystem.
Further details of these components are given in
the following sub-sections.
2.2.
The Antarctic Device
The Antarctic device is currently deployed on the
ice of Crooked Lake, in the Vestfold Hills of the
Antarctic, about 15 km from the Davis Base [3] of
the Australian Antarctic Division (68° 35’ 31.9”
S, 78° 21’ 32.7” E); figure 3 shows the device in
position.
System Design
2.1.
Overview
Figure 2 gives an overview of the system that we
have designed, built and deployed over the last 18
months. The total system comprises a number of
interlinked components, with sensor measurement
data flowing from left to right:
• At the left is the Antarctic sensing device
itself, which is deployed on the icy surface of
the lake to be monitored.
• This communicates using the Iridium [1]
Low-Earth Orbit (LEO) satellite telephony
network to a base computer, where its raw
data is unpacked and scaled.
• An OGSA-compliant [2] Grid service, the
Antarctic device Grid proxy, makes the
device – and its data – available on the Grid.
• The data is archived in a Grid-accessible
database.
• The Antarctic scientist can then work with
the data in this Grid archive, visualising it,
DF GRID Port Types:
DeviceProxyFactory
D
Device
S
Sensor
DB
Database
Fig. 3. The Antarctic device on Crooked Lake
There are many practical challenges when
deploying a device in such harsh conditions, with
temperatures dropping to –40ºC and wind speeds
exceeding 60mph. Suitable provision of power is
particularly important due to the limited physical
access, low temperatures (which drastically affects
the performance of batteries) and the extreme
latitude (the sun is below the horizon for 38 days
at mid-winter). The transport and assembly of the
device was also a challenge, occurring in stages
between Nottingham in the UK, Hobart in
Australia, Davis Base in the Antarctic and the
Crooked Lake site.
Register new
device
Generic device proxy
factory(s)
DF
PAR sensor
Iridium
Data preprocessor
Antarctic device
Grid proxy
PAR sensor
proxy
S
Other sensor
proxy
Proprietary and bespoke protocols and technologies
S
Model and
analyse data
Add sensor to
trial database
D
UVB sensor
Other sensors
Device Proxy
Management Client
Data
manager
Antarctic device
Data logger
New device
configuration
Sensor
data-pump
Sensor
Database
Service
DB
Sensor
data-pump
RDBMS
OGSA-compliant GRID technologies
Fig. 2. Current Antarctic e-Science system overview
Desktop user interface
Data
chooser/
fetcher
Table views
Graph views
The device is based on a commercial scientific
data logger [4], with various sensor interface and
storage modules. This is wired to the various local
sensors which currently comprise:
• Wind monitor, reporting wind speed and
direction;
• Battery level sensor;
• Temperature sensors above the ice, inside the
device itself, and at depths of 3m and 5m in
the lake;
• A series of temperature sensors straight
through and immediately below the ice;
• Photosynthetically Active Radiation (PAR)
sensors above the ice, facing the ice (to
determine albedo), and at depths of 3m, 5m,
10m and 20m in the lake;
• Ultraviolet-B (UVB) sensors above the ice
and at a depth of 3m in the lake;
• Sonar range-finder, which measures the
thickness of the ice from below.
We had also planned to attach a GPS receiver to
the device, to monitor any change in ice position
or height, however this has not been possible to
date due to coordination issues with the other
sensors.
We had originally planned a relatively slow
measurement cycle (two to four measurements per
day), however with the device in place we have
actually been able to support a measurement cycle
of one reading every five minutes, continuously.
This is a dramatic change from the weekly or
fortnightly schedule that was possibly with
manual measurement.
2.3.
Remote Communication
The data logger is connected to an Iridium data
modem, by which it can transfer data to a
computer back at Davis Base (or anywhere else in
the world, for that matter). This connection is
relatively expensive and slow, at approximately
$2 per minute and a throughput in practical use of
around 1000 bits per second. However this was
the only viable method given the remoteness of
the site, the latitude and the intervening hills
between the site and Davis Base.
As well as downloading data from the device, it
can also be re-programmed over the satellite
modem. However, if this fails part-way through
then the device may well require a manual reprogramming (this has already happened once).
The satellite connection also requires that the
batteries be in a reasonably good state of charge,
and is unable to establish calls at lower supply
voltages, whereas the rest of the device is still
operational.
2.4.
Grid Components
We have combined efforts with the other
EQUATOR IRC e-Science project to develop a
common Grid software infrastructure for devices
and sensors. This is described in more detail in
[5]. Briefly, we have defined new Grid service
port types (interfaces) to represent a generic
device and a generic sensor “on the Grid”. The
supporting tools and services (on the right-hand
side of figure 2) exploit these common port types
to handle varying devices and sensors in a
standard way.
The Device Proxy Management Client (at the top
of figure 2) allows the person responsible for a
device to create a suitable Grid service to
represent that device on the Grid (a device
“proxy”). The Data (or Trial) Manager and Sensor
Data Pumps can be used to archive data from
sensors in a common Sensor Database Service. At
this point the data from the sensors is made
available to interested parties via this Sensor
Database Service; this service is tailored to
support the data formats and queries appropriate
to sensors, however the internal data model is
relational, and we also plan to provide an OGSADAI [6] interface to this archive.
2.5.
Data Access and Analysis
We have created a simple desktop Graphical User
Interface (GUI) to allow scientists to download
and analyse the sensor data from the Sensor
Database Service. This interface uses a visual
data-flow paradigm [7] to allow non-programmers
to perform a range of data retrieval, processing
and visualisation functions. Figure 4 shows a
simple processing network. The first component is
a data-loader, which collects data for a particular
sensor and time-period from the Sensor Data
Archive Grid service. This is then routed through
a table viewer which allows the user to view the
data as a table and to select a subset of the data.
This subset is then routed to a 2D chart generator,
which can create a range of standard graphs.
Sample results are shown in section 3.
Our colleagues at the University of Glasgow have
also been analysing data from Antarctic device
using a similar Hybrid Information Visualisation
Environment (HIVE) [19]; this currently supports
a range of multidimensional scaling algorithms,
but lacks a Grid client facility. We are actively
porting components between these tools, such as
the fish-eye table viewer from HIVE.
Using the device and sensor Grid services it is
also possible to monitor measurements as soon as
they reach the Grid; this is described further in
[5].
data comparable to figure 5(b) rendered using
VTK. We plan to use these kind of 3D
visualisations in exploratory virtual reality and
augmented reality interfaces.
Fig. 4. A simple processing network
3.
Fig. 7. PAR readings visualisation
Results
In this section we show some of the data that was
obtained from the Antarctic device during its
summer deployment (17th – 31st January 2003).
Figure 5 shows the levels of Photosynthetically
Active Radiation (PAR) measured at the surface
and at various depths in the lake. Figure 5(a)
shows the smooth curve resulting from a clear
day, while in figure 5(b) the effects of varying
cloud cover on a partially cloudy day.
(a)
(b)
Fig. 5. PAR readings (a) clear day (b) cloudy
Figure 6 shows the thickness of the ice during the
summer deployment, as measurement by the uplooking sonar. The ice begins to melt rapidly
towards the end of the period, after which the
device was removed while it was possible to land
a helicopter on the ice.
4.
The work reported in this paper is very much
work in progress. However, it has already
highlighted a range of e-Science issues, ranging
from the environmental science being supported,
to the Grid technologies that are being applied.
These issues are explored in the sub-sections that
follow.
4.1.
We have also been exploring the use of
Visualization Toolkit (VTK) [6]; figure 7 shows
New scientific directions
Using the Antarctic device we have been able to
capture data regularly, irrespective of weather
conditions, approximately 2000 times more
frequently than with previous manual methods.
This level of temporal detail is providing new
insights into the minute-by-minute changes in the
lake environment, as seen in the cloudy day data
in figure 5(b). At slightly longer timescales, it
now becomes possible to observe and analyse
diurnal effects in the environment.
The level of detail represented by this detail also
allows us to apply new modelling and analysis
methods. For example, it is possible to begin to
model the complete energy balance within the
environment, using the detailed light, temperature
and ice-thickness measurements. Such a model
can then be used to explore hypothetical changes
in environmental conditions much more precisely
than the current course-grained models based on
no more than weekly measurements.
4.2.
Fig. 6. Ice thickness
Discussion
Getting and using the data
In order to get maximum utility from the Antarctic
device and its data in the short term (before the
Grid software components were fully developed
and available) we adopted an interim data
encoding and exchange methods that would fit
directly into their current working practices, i.e.
simple textual data files, compatible with Excel,
sent by email from the researchers in the
Antarctic. This has allowed the environmental
scientists at Nottingham to make immediate use of
the data and the tools and methods that they are
familiar with.
We have now reached the point where the data
can also be published, archived and distributed via
the device and sensor Grid infrastructure that we
have been developing (as outlined in section 2).
However, there are still many practical issues and
choices to be resolved to determine how best to
make this data – and other Grid facilities such as
remote computation – available to the
environmental scientists within their everyday
work. This is one reason for our development of
the simple visual data-flow user interface
mentioned in section 2.5, since many of the
scientists who we are working with are not
programmers. Our hope is to develop this desktop
user interface to the point where it can be used by
the scientists with minimal additional effort
compared to their existing practices, and to use it
as a point of entry to their working environments
which we can then grow to make other Grid
services and facilities to them. We have chosen to
use a standalone desktop application rather than
(say) a web portal because we wish to support rich
and finer-grained interactivity.
4.3.
Remote science issues
Working with a device – and colleagues – on the
other side of the planet raised many complications
compared to local working. The researchers have
had to overcome a huge variety of pragmatic
issues in the process of deploying and operating
the Antarctic device, ranging from the
coordination of deliveries of parts across the
globe, to on-site problems such as the fracturing
of fixing bolts in the extremely cold conditions
and the extremely short life of laptop batteries in
this climate.
One critical set of issues has revolved around
establishing and managing confidence in the
Antarctic device, and as part of this, the handling
of software and hardware failures. A fundamental
challenge here is that only certain things can be
done remotely; in some situations physical
attention is unavoidable. Of course, this is not the
complete show-stopper that it would typically be
in a satellite-based system, but equally the cost
profile is somewhat different (much cheaper
devices,
correspondingly
more
limited
development effort), and the environmental
pressures are also different.
Anderson and Lee [9] consider software fault
tolerance in terms of four phases, which are
directly relevant in this situation:
1. Error detection, i.e. determining that there is
a fault.
2. Damage confinement and assessment, i.e.
determining – and limiting – the scope of the
problem, e.g. what data is affected, and how
far incorrect data may have been distributed.
3. Error recovery, i.e. performing compensatory
actions, e.g. to correct or withdraw erroneous
data.
4. Fault treatment and continued service, i.e.
dealing with underlying cause of the fault,
e.g. replacing or recalibrating a physical
sensor.
For example (see also [19]):
1. It was observed at one point that the data
logger was reporting negative values for
PAR at depth 10m. This is clearly
impossible, since it would indicate a negative
amount of light: the error has been detected
(in this case by a bounds or reasonableness
check).
2. By inspection, it could be seen that only
certain values from this single sensor were
apparently in error (damage assessment); if
appropriate the publication of the data could
be delayed (damage confinement).
3. Only limited recovery is possible in this case
since the historical reading cannot be recaptured; error recovery is therefore limited
to the publication of anomaly metadata
which warns potential data users of values
which should be disregarded (with a suitably
flexible data format those values could be
individually excluded).
4. The field researcher then went out to the
device and determined by direct inspection
of the interface hardware, application of
synthetic stimuli and comparison with
similar reference devices that the gain for
this particular PAR sensor was too high, so
that in bright light it exceeded the working
range of the input, giving an overflow error.
The gain was turned down, and the sensor
recalibrated and redeployed (fault treatment
and continued service).
Another significant issue of remote – or more
specifically distributed – science has been the
effort required to coordinate the activities of
researchers in Nottingham and in the Antarctic.
As well as task-specific coordination and
collaboration, lots of work is also needed if the
distributed researchers are to feel a common
involvement in the research and the social
processes that support it on a day-to-day basis, for
example, keeping in touch, maintaining an
awareness of promising directions to explore,
developing a common agenda and mutual
understanding, and so on. These things are not
well addressed by emerging Grid technologies and
approaches. An open-ended Access Grid [10]
session would be about the best support on offer at
present, however this cannot be used from Davis
base because of the limited networking (a single
shared satellite connection to the Internet) and the
lack of a suitable installation on the Base.
4.4.
Grid software issues
The typical vision of the Grid [11] is of a
pervasive, i.e. universal, computing and
communication infrastructure, connecting – at
least potentially and subject to security policies –
everyone, everywhere. In principle, then, we
might imagine placing the Antarctic device
directly onto the Grid, allowing it to expose its
resources (in this case the sensors, data log and
logging program) through a standard service
interface. However, there are two major problems
to getting this device – and many other devices –
onto the Grid in this way:
• The device does not have the code storage or
computational power to run even a small
Grid software stack, so it cannot directly host
a Grid service; and
• The device is not – and cannot be –
permanently connected to the Grid network,
because it is (a) too expensive and (b)
subject in any case to periods of nonavailability.
Some may argue that these are only temporary
problems, or ‘implementation details’, that will be
solved in a few years time. We prefer (a) to do
useful work in the mean time and (b) to wait and
see whether this technological future is actually as
perfectly uniform and free of practical problems
as this view might suggest.
Consequently, we have adopted a dual strategy of
defining common device and sensor service
interfaces (which could be supported directly by a
sufficiently capable networked device) and
creating a default implementation framework
which uses proxy service on the fixed Grid
network to represent our current devices and
sensors, with the actual devices – in this case the
Antarctic device – connected to its proxy as and
when it can, by whatever means are currently
available. When the Antarctic device is not
connected the proxy can still provide data and
queue reconfiguration requests, allowing Grid
based clients to be written as if the device was a
first-class Grid citizen. This is described in more
detail in [5].
4.5.
Sensor/device issues
The example of the problem with a PAR sensor in
section 4.3 also highlights the need to work with
kinds of data additional to the sensors
measurements themselves. In that case, the
metadata required included:
• Calibration metadata, i.e. what measured
voltage corresponds to what actual level of
PAR (in u-mols-1m-2), which may change
from time to time due to adjustments or drift.
• Accuracy or fidelity metadata, i.e. how
accurate is the sensor, and with what
resolution does it provide its measurements.
• Data validity or availability metadata, i.e.
that some readings should be ignored (in that
case any readings less than zero), and
perhaps a reason for this.
• Structural metadata, i.e. which particular
reading from the data logger (e.g. which
column in the Excel file) corresponds to
which sensor.
• Deployment metadata, i.e. where (in the
world) is the device and/or sensor actually
deployed.
We have adopted and extended eXtensible
Scientific Interchange Language [12] to describe
the structure of tabular text-based data in a
standard way.
We are also exploring the possible use of
SensorML [13] (which is being brought into the
OpenGIS consortium standardisation process) as
one standard way of documenting some of this
data (especially deployment and accuracy
metadata).
However the choice of an XML format is only
part of the solution; we also seek to facilitate the
associated work with the device itself. For
example, even something as simple as unique
tagging of sensors and devices (using RFID tags
or barcodes) with suitable handheld support
devices would make it much easier to manage
collaboration data and link it back to the sensor at
issue. Choice of suitable hardware and bus
technologies (e.g. comparable to USB [14]) also
makes it possible to sensors and devices to (a)
describe themselves to some extent and (b) selfdiscover at least some aspects of their own
deployment and structure. We are continuing to
explore these issues in various EQUATOR
projects.
5.
Conclusion and Future Directions
The Grid is only one part of the total scope of eScience. Through the design, construction and
deployment of an environmental sensing device in
the Antarctic we have been able to obtain
collective primary experience of the many – often
apparently mundane – activities and elements that
together make up this particular scientific
endeavour. This device is already providing data
that goes substantially beyond that previously
available to us. Making this device available on
the Grid – together with the Medical wearable
computer and phone-based devices described in
[5] – is also driving the design and development
of new Grid interfaces and supporting
technologies for these kinds of devices.
Our ambitions in the remainder of this project are
to:
• Continue to analyse and exploit the data that
we are obtaining from the device, to increase
our understanding of this Antarctic lake
environment.
• To begin to do this using the desktop Grid
interface that has been developed, and to use
this as a platform from which we can explore
other Grid possibilities, such as the more
stereotypically large computational analyses
on remote machines.
• To further develop the supporting software
and devices to explore support for
configuration, calibration, management and
trouble-shooting of physical devices such as
this.
• To bridge between the normal Grid software
infrastructure that we have been working
with to date and some of the other
‘experience-oriented’
infrastructures
developed and used on other parts of
EQUATOR (e.g. EQUIP [15]).
Links
For more information see the EQUATOR website pages for this project [16], more data from
and information about the Antarctic device [17],
or the web-site for the sister medical devices
project [18]. We anticipate an open-source release
of the software infrastructure before the end of the
project;
email
Chris
Greenhalgh
(cmg@cs.nott.ac.uk)
or
Tom
Rodden
(tar@cs.nott.ac.uk) in the first instance.
Acknowledgements
This work is supported by EPSRC Grant
GR/R81985/01 “Advanced Grid Interfaces for
Environmental e-science in the Lab and in the
Field”, the EQUATOR Interdisciplinary Research
Collaboration (EPSRC Grant GR/N15986/01), the
Australian Antarctic Division, EPSRC Grant
GR/R67743/01 “MYGRID: Directly Supporting
the E-Scientist” and the EPSRC DTA scheme. We
thank Greg Ross and Matthew Chalmers for
contributions to the data analysis tool.
References
[1] Iridium Satellite LLC,
http://www.iridium.com/ (verified 2003-0728).
[2] I. Foster, C. Kesselman, J. Nick, S. Tuecke,
“The Physiology of the Grid: An Open Grid
Services Architecture for Distributed
Systems Integration”, Open Grid Service
Infrastructure WG, Global Grid Forum, June
22, 2002.
[3] Australian Antarctic Division, “Antarctic
Families and Communities: Davis”,
http://www.antdiv.gov.au/default.asp?casid=
404 (verified 2003-07-28).
[4] Campbell Scientific Canada Corp., “CR10X
Measurement and Control Module and
Accessories”, http://www.campbellsci.ca/
CampbellScientific/Products_CR10X.html
(verified 2003-07-28).
[5] T. Rodden, C. Greenhalgh, D. DeRoure, A.
Friday, L. Tarasenko, H. Muller et al.,
“Extending GT to Support Remote Medical
Monitoring”, Proceedings of the UK
e-Science All Hands Meeting 2003,
Nottingham, Sept. 2-4.
[6] Open Grid Services Architecture Data
Access and Integration (OGSA-DAI),
http://www.ogsadai.org.uk/ (verified 200307-28)
[7] B.A.Myers, “Taxonomies of visual
programming and program visualization”,
Journal of Visual Languages and Computing,
pp. 97-123, March 1990.
[8] W. J. Schroeder, K. M. Martin, W. E
Lorensen, “The Design and Implementation
of an Object-Oriented Toolkit for 3D
Graphics and Visualization”, IEEE
Visualization '96, pp. 93-100.
[9] Lee, P.A. and Anderson, T., “Fault
Tolerance: Principles and Practice (Second
Revised Edition)”, Springer-Verlag, 1990,
ISBN: 3-211-82077-9.
[10] The Access Grid Project,
http://www.accessgrid.org/ (verified 200307-28).
[11] Ian Foster. Karl Kesselman (eds.), “The Grid
Blueprint for a New Computing
Infrastructure”, Morgan Kaufmann, 1998.
[12] Roy Williams, “XSIL: Java/XML for
Scientific Data”,
http://www.cacr.caltech.edu/projects/xsil/
xsil_spec.pdf (verified 2003-07-28).
[13] Open GIS Consortium Inc., “Sensor Model
Language (SensorML) for In-situ and
Remote Sensors”, OGC 02-026r4, 2002-1220, http://www.opengis.org/techno/
discussions/02-026r4.pdf (verifier 2003-0728).
[14] USB Integrators Forum, “USB 2.0
Specification”,
http://www.usb.org/developers/docs (verified
2003-07-28).
[15] University of Nottingham, “The Equator
UnIversal Plaform”,
http://www.equator.ac.uk/
technology/equip/index.htm (verified 200307-28).
[16] EQUATOR, “Environmental e-Science
Project”, http://www.equator.ac.uk/projects/
environmental/index.htm (verified 2003-0728).
[17] Malcom Foster, “Data”,
http://www.mrl.nott.ac.uk/~mbf/
antarctica/data.htm (verified 2003-07-28).
[18] “MIAS EQUATOR Devices”,
http://www.equator.ac.uk/mias (verified
2003-07-28).
[19] Greg Ross and Matthew Chalmers, “A
Visual Workspace for Hybrid
Multidimensional Scaling Algorithms”, To
appear in Proc. IEEE Information
Visualization (InfoVis) 2003, Seattle.
Download