Abstract

advertisement
The NCI National Environmental Research Data Interoperability Platform to support
High Performance access to Oceans- and Marine-related Interdisciplinary Research
Ben Evans1, Tim Pugh2, Edward King3, 4, Jonathan Hodge3, Lesley Wyborn1.
1
National Computational Infrastructure (NCI), Australian National University,
Canberra, Australia
2
Bureau of Meteorology, Melbourne, Australia
3
CSIRO Marine and Atmospheric Research, Australia
4
Integrated Marine Observing System, Hobart, Tasmania
As we exponentially increase data volumes from ocean observation and modelling activities, access to
data and the analysis of long-term data archives becomes increasingly challenging. But the oceans
community is not alone: all members of the Earth Systems and Environmental communities are facing the
same challenge. Therefore we need a solution that will not only enable the oceans community to manage
its own data assets, but at the same time facilitate seamless integration of these data sets with data from
other communities (e.g., climate, atmospheric, near shore terrestrial and bio) to empower the next
generation of high resolution, Data-intensive interdisciplinary research.
To progress towards this goal, the National Computational Infrastructure (NCI) at the Australian National
University (ANU) has organised a priority set of large volume national environmental and earth systems
science data assets on a High Performance Data (HPD) Node within a High Performance Computing (HPC)
facility. The node was developed under the Research Data Storage Infrastructure (RDSI) program, which is
a component of the Australian Government’s National Collaborative Research Infrastructure Strategy. The
colocation of these large volume collections with a high performance and flexible computational
infrastructure is designed to support the emergent area of the Data-Intensive Science, whereby HPC
analytics can be directly undertaken across the all data content for interdisciplinary analysis. To achieve
this, formats need to be self-describing and all attributes need to conform to international standards for
vocabularies and ontologies. High Performance access to data is facilitated through direct access on NCI’s
supercomputer (Raijin) and cloud (Tenjin), as well through OpenDAP, OGC and other services, and fast
programmatically-searchable catalogues.
There are 31 (and growing) data collections in the initial ingestion at NCI requiring over 10 Petabytes
(PBytes) in storage volume. They are currently categorised into six major fields all related to the
environmental sciences:
1) Earth system sciences, climate and weather model data assets and products;
2) Earth and marine observations and products;
3) Geosciences;
4) Terrestrial ecosystem;
5) Water management and hydrology; and
6) Astronomy, social science and biosciences.
Properly architected the National Environmental Research Data Interoperability Platform will lead to:
 A dramatic improvement in the scale, resolution, reach and integration of Australian Oceans and
Marine research;
 Seamless high performance access to nationally significant data collections using new Data-Intensive
capabilities to support cutting-edge research methods; and
 The realisation of synergies with related international research infrastructure programs, particularly
those of the Oceans and Marine research domains.
Download