MTPII-MATER Data Management Final Report

advertisement
MTP II- MATER Data Management
1996-1999
(MAS3-CT96-0051)
Final Report
C. MAILLARD1, M. FICHAUT1, A. GIORGETTI3, E. BALOPOULOS2, S. IONA2, A.
LATROUITE1, B. MANCA3, P. NICOLAS4 and J-A. SANCHEZ-CABEZA5
1
IFREMER/SISMER, BP 70, 29280 Plouzané, France
Hellinikon, 16604 Athens, Greece
3OGS, PO Box 2011, 34016 Trieste, Italy.
4SAFEGE/CETIIS, 30 Av. Malacrida, 13100 Aix en Provence, France
5 Universitat Autònoma de Barcelona,08193 Bellaterra,Spain.
2NCMR/HNODC,
TMSI/IDM/SISMER/006017 - février 2000
MTPII-MATER Data Management Final Report
2/22
CONTENT
1.
Introduction .............................................................................. 3
1.1.
Summary of the Objectives and Methodology............................................. 3
1.2.
Data Management Structure and role ......................................................... 4
1.3.
Data and Meta-data Circulation ................................................................... 5
2.
Meta-data Management - WWW Catalogues ......................... 6
3.
Data sets Archived .................................................................. 8
3.1.
Physics ........................................................................................................ 8
3.2.
Biochemistry ................................................................................................ 8
3.3.
Other specific parameters ........................................................................... 8
4.
Quality Assurance ................................................................... 13
4.1.
Definition of Data Protocol ........................................................................... 13
4.2.
Implementation : Data Formatting and Qualification ................................... 14
4.3.
Data Experts Review ................................................................................... 16
5.
MTPII-MATER Database on CD-ROM ..................................... 17
6.
EVENTS .................................................................................... 19
6.1.
Data Management Meetings ....................................................................... 19
6.2.
Participation to Scientific and technical Meetings ....................................... 19
7.
Conclusion ............................................................................... 20
8.
References ............................................................................... 21
9.
Annexes - Regional Data Management Reports ................... 22
TMSI/IDM/SISMER/00-017 - février 2000
1.1. MTPII-MATER Data Management Final Report
3/22
1. Introduction
1.1. Summary of the Objectives and Methodology
The objective of the Data Management Workpackage, was to insure that the
precious multidisciplinary data sets collected during the 105 sea cruises (about
1000 days at sea) and on the 126 mooring lines from 1996 to 1999, would be
easily exchanged among the participants, safeguarded for further use and
published on the best media available at the time of the project. As currently
made during major international projects, an operational data management
structure has been defined to carry out the corresponding tasks. It has been a
regionally distributed structure.
A preliminary task has been to define a common protocol for data formatting
and quality checking, so that data of the same type collected by different teams,
would be coherent and comparable, whatever their sources. This protocol has
been improved all along the project implementation.
The quality assurance and the safeguarding are improved by archiving the data
as soon as possible after the data collection and scientific validation. The
methodology for data tracking and for fastening the circulation of information
and data has been to maintain catalogues by regular contacts with the scientists
and to publish them on internet in a standardised form at each regional data
centre. The data have been organised in "basic parameters" which are
parameters useful for all disciplines, and other "specific parameters", for
which no any agreed standards exist. Both are archived, but only the basic
parameters are submitted to the full quality assurance protocol.
To meet a wide potential range of users, a prototype of the Cdrom has been
prepared and demonstrated in the Perpignan final workshop for first evaluation;
it is joint to this report.
TMSI/IDM/SISMER/00-017 - février 2000
MTPII-MATER Data Management Final Report
4/22
1.2. Data Management Structure and role
The data management structure, schematised in Fig. 1, includes three regional
archiving centres and two operational groups.
Regional DC
W Basin
RDC
Adriatic
RDC
Ionian E Basin
Animation Task
Data
Manager
Quality Experts Committee
Project
Co-ordinator
Fig 1 : MTPII-MATER Data Management Structure
The three regional centers are :

National Center for Marine Research, HNODC, Greece (Eastern Basin)

Osservatorio Geofisico Sperimentale, Italy (Adriatic/Ionian Basin)

IFREMER/SISMER, France (Western Basin and co-ordinator data
manager)
They had the tasks to
1. Develop the common protocol
2. Compile the meta-data and disseminate on WWW
3. Request copies of the data from the source laboratories, and process
them for archiving in conformity with the data management protocol.
The Animation Task (CETIIS, France) maintained the cruise schedule and on
line synthesis of the data status for the project management.
The Data Quality Group supervises the validating methods, issuing the
appropriate qualification for each data set.
TMSI/IDM/SISMER/00-017 - février 2000
MTPII-MATER Data Management Final Report
5/22
1.3. Data and Meta-data Circulation
The circulation of information (meta-data) and data (illustrated in Fig. 2)
within the data management structure was the following :
1. Search for cruise schedule, both from project and national authorities
2. Request meta-data : summary reports for cruises (ROSCOP), mooring,
instrument, data sets (EDMED) by sending standardised forms
3. Request data from these reports, reformat, safeguard, check for quality
4. Publish up to date catalogues of data and meta-data on WWW servers
5. Disseminate data and meta-data according to the project policy.
Fig 2 : CIRCULATION OF DATA & INFORMATION DURING MTP II-MATER
Rectangles : organism or person
Ellipses : Deliverables - Services - Products
TMSI/IDM/SISMER/00-017 - février 2000
MTPII-MATER Data Management Final Report
6/22
2. Meta-data Management - WWW Catalogues
The cruises, moorings, instruments and data sets summary reports have been
archived and made available, without any restriction on the data management
WWW servers At any time it has then be possible to get a complete visibility
of the fieldwork and the data sets collected. These web servers are :
Western Basin : http://www.ifremer.fr/sismer/program/mater/
 Adriatic/Ionian : http://doga.ogs.trieste.it/mater/
 Eastern Basin:
http://hnodc.ncmr.ariadne-t.gr/programmes/mater/
and synthesis made by the Animation: http://bali.cetiis.fr/mtp/mater/.

On each of these web sites, a common data management home page has been
developed, with access to all the standardised cruise, mooring, instrument and
data reports, and links to the other centres and project web sites (Fig. 3).
Fig 3: Data Management home page of a Regional Data Centre
Clicking on “Cruises Summaries” returns the cruises listed by year, and
clicking on any of the cruise returns the corresponding report including ship
tracks, list of collected data, list of archived data (with location). The 105 sea
cruises reports can be downloaded.
Clicking on "Moorings Summaries " returns a list linked with similar reports
as for cruises reports (126 mooring reports).
TMSI/IDM/SISMER/00-017 - février 2000
MTPII-MATER Data Management Final Report
7/22
Clicking on "Instruments Summaries " returns a list of the main instruments
used during the project, the laboratories and information on the sensors
calibrations.
Clicking on "Data sets Summaries " returns the list of the three main groups
in which the data sets have been organised to encompass the difficulties due to
the complexity and overlapping of the various disciplines:
Physics
Bio-chemistry
Specific parameters (miscellaneous non standard parameters)
These groups are themselves sub-divided into sub-groups (Fig. 4) depending
on the method/sensor type (ex: CTD, Lagrangian floats, current meter time
series for physics) or the compartment (ex: dissolved, particulate, settling
particles, sediment, pore water for chemicals).
Fig 4: WWW page of the data sets catalogue for Central Basin
The first two groups correspond to basic parameters. Clicking on any item
returns the corresponding list of cruises/moorings lists where these data have
been collected and the data location.
TMSI/IDM/SISMER/00-017 - février 2000
MTPII-MATER Data Management Final Report
8/22
3. Data sets Archived
The physical and biochemical basic parameters data have been reformatted at a
unique common format (MEDATLAS) and full quality assurance procedure
have been applied before archiving. The other specific parameters have been
only safeguarded. A brief synthesis of the archived data is presented here
below, in the state available for the Perpignan meeting. In fact some more data
are still in process of archiving and it does not represent the final content of the
database.
3.1. Physics
VERTICAL PROFILES
TIME SERIES
3013 CTD
CTD TIME SERIE
572 XBT
21 THERMISTOR STRING
461 ADCP VERTICAL PROFILES
110 CURRENT METER
LAGRANGIAN FLOAT
The positions of the observations are reported on Fig. 5 for the vertical profiles,
Fig.6 for the time series on fixed moorings and Fig. 7 for Lagrangian time
series.
3.2. Biochemistry
VERTICAL PROFILES
TIME SERIES
1473 BOTTLES STATIONS
37 SEDIMENT TRAPS
269 BIOLOGICAL STATIONS
The positions are reported on Fig. 8.
3.3. Other specific parameters
Additional or specific data (meteorological, biological..), have been only
marginally archived, without any reformatting or quality check. They are
available only at the original source format of the scientific file, and can be
extracted by cruise file only. The positions are not reported.
TMSI/IDM/SISMER/00-017 - février 2000
MTPII-MATER Data Management Final Report
9/22
Fig. 5: Vertical Profiles of Physical Observations - Stations
TMSI/IDM/SISMER/00-017 - février 2000
MTPII-MATER Data Management Final Report
10/22
Fig. 6: Time series of Physical Observations - Moorings
TMSI/IDM/SISMER/00-017 - février 2000
MTPII-MATER Data Management Final Report
11/22
Fig. 7: Lagrangian Time series of Current
TMSI/IDM/SISMER/00-017 - février 2000
MTPII-MATER Data Management Final Report
12/22
Fig. 8: Biochemical Data in station and sediment traps
TMSI/IDM/SISMER/00-017 - février 2000
MTPII-MATER Data Management Final Report
13/22
4. Quality Assurance
Quality assurance (QA) has been a high point of the MTPII-MATER data
management. The data have been first scientifically validated in the scientific
laboratories; then copies are transmitted to the archiving centre where they are
reformatted at a unique common format (extended MEDATLAS), checked for
quality (QC) and safeguarded. Even if the validation is under the responsibility
of the scientists, these last QC made before final archiving, allowed to
crosscheck the data and insure that no errors have been introduced during the
data reformatting. This procedure is in accordance with the recommendation of
the international organisation like UNESCO/IOC and MAST and the practises
of the other major international projects like WOCE and JGOFS.
Basically, QA procedure included 3 tasks:

definition of a common protocol for formatting and QC in accordance to
the international standards

implementation of the protocol for archiving basis parameters

validation of the procedures by a Data Quality Expert group.
In each regional data centre, it was very important to follow standardised
procedures of quality assurance to prepare coherent data sets that can be used
by the whole community. Several hundred of parameters have been measured
within MTPII-MATER, among them 105 basic parameters of physics and
biochemistry were to be shared by the participants and later on, by other users.
The quality assurance procedure is focused on these parameters. For the other
parameters, resulting from new sensors or methodologies, or for which
international agreed standards did not exist, the QA procedure was not
applicable.
4.1. Definition of Data Protocol
The MTPII-MATER protocol (1) to handle the information and the data, is
based on the international recommendations of UNESCO/IOC and MAST (2)
and the previous MAST/MEDATLAS protocol (3) which has been developed
to process all the basic parameters, vertical profiles and time series.
This protocol includes:
1. A data dictionary where the parameter names, units and codes have been
standardised. The International System (IS) was used for units, and rules
for derived units were recalled, based on ISO standards guidelines.
2. The description of the format: autodescriptive ASCII, including short
cruise/mooring header with reference to the author, station header, columns
TMSI/IDM/SISMER/00-017 - février 2000
MTPII-MATER Data Management Final Report
14/22
of observations referred to pressure for vertical profiles (depth in the
sediments) and time for the times series.
3.
The description of the final quality checks (QC) performed on the basic
parameters before archiving. QCs includes automatic and visual checks :



QC-0: check the format, units, codes and overall completeness and
consistency of information
QC-1 : check the date and location
QC-2 : data points : minimum/maximum broad range values,
comparison with climatological statistics (when available), search for
spikes, stuck sensor, vertical instabilities etc.
The visual checks give the overall consistencies of the data within the same
data set, find out the wrong value in case of vertical instability, validate the
climatological test in some areas etc. The results are quality flags added to each
numerical value.
4.2. Implementation : Data Formatting and Qualification
The data received at the data centres are reformatted at the common format,
and if necessary, the source scientist is contacted to complete the information.
In case of units problems, conversion to IS units were made, when possible to
insure comparability with the other data set.
The QCs have been performed on the reformatted data files in the three
MATER data centres. Hellenic and French data centres used an expert software
(SCOOP), and Italian Data Centre, a local software. Examples of QC1 and
QC2 checks from SCOOP are displayed in Fig. 9 and 10.
After control that the outliers are not artefacts due to formatting, the results of
the QC are communicated to the responsible scientists, to take further actions
like validation, correction or elimination if necessary. Close co-operation with
the scientists who collected the data has widely contributed to the quality
assurance.
TMSI/IDM/SISMER/00-017 - février 2000
MTPII-MATER Data Management Final Report
15/22
Fig. 9: Check of the Location and Date of the Observations (QC1)
Fig. 10: Check of the Location and Date of the Observations (QC2)
(in green the current profile, in blue the corresponding climatological profile)
TMSI/IDM/SISMER/00-017 - février 2000
MTPII-MATER Data Management Final Report
16/22
4.3. Data Experts Review
A Data Quality Group of Experts (DQE) has been created by the project coordinator and its first meeting was held in Rome (CNRS) 17-11-1997, during
the International Conference “Progress in Oceanography of the Mediterranean
Sea”. It was composed of scientists and data managers in the following fields
of expertise:
 Suspended Particulate Matter : Project Co-ordinator
 Inorganic chemistry (metals, CFC and radionuclides): Joan-Albert
Sanchez-Cabeza (DQE Leader).
 Physical oceanography, satellite data: MATER Data Managers.
 Biology and biochemistry of the water column (nutrients, biogenic
compounds, primary production, microbiology, fauna): Paul Wassmann.
 Biology and biochemistry of the benthic zone (nutrients, biogenic
compounds, primary production, microbiology, fauna): Roberto Danovaro.
Three main tasks were assigned to DQE:
1. The first and most basic role given to the DQG was to transmit to all
scientists the need of Data Quality control.
 All laboratories should observe adequate internal Data Quality Assurance,
which includes good sampling and analytical protocols, good laboratory
practice (blanks, replicates, proper recording and other aspects) and
adequate reporting to the Data Manager.
 All laboratories should participate to their maximum capabilities, or even
organise, inter-comparison exercises, and the results of these exercises
should be made available to the MATER community.
2. Another important item was to select the list of "Basic Parameters" to be
used by most of the partners, and specifically requested by the modellers. It
was agreed that the Steering Committee should provide the DQG a limited list
of basic parameters. For these parameters, it was desirable to include a
suggested methodology in the MTP Quality Assurance Manual, to organise the
inter-comparison exercise, and to implement the full data management protocol
before archiving and dissemination. For the other parameters: because of the
complexity and cost foreseen, in particular regarding some organic chemistry
and biological parameters, the archiving would be limited to safeguarding.
2. Then the Data Quality Group intended to establish a reviewing procedure
for data sets, in collaboration with expert scientists. Depending on the data
type, the subsets of the data sets should be sent to one of the following Data
Quality Experts:
TMSI/IDM/SISMER/00-017 - février 2000
MTPII-MATER Data Management Final Report
17/22
From this, will issue a simple Data Quality Statement such as:

Data Set is OK

More information is needed (specify)

Data set must be re-evaluated.
Due to its relatively late settlement, the lack of funding, the belated data
submissions etc.. DQE did not have real possibility neither for conducting
inter-comparison nor for reviewing data sets out of the operational QC
procedure applied in the Data Centres. However they have been very useful
advisers for the crucial points:
1. Selection of the list of basic parameters
2. Validation of the data dictionary for micro-biology and radioisotopes
3. Check the control values for the QC protocol.
5. MTPII-MATER Database on CD-ROM
To facilitate the access to data, a MTP II-MATER CD-ROM will be published.
A beta-test CD-ROM (Fig. 11) has been presented at the Perpignan workshop
(October 1999) with the complete inventories, subsets of the database,
extraction software, and documentation.
An integrated user-friendly SELMATER software allows selection,
extraction, and visualisation of the observations. The data selection can be
done according to several criteria: geographical area, data type (vertical
profiles or time series), cruise name or reference, time period (year, month,
date), source country, ship, measured parameters and quality flags. Two output
formats are available: MEDATLAS and CSV tables (with limited information
on data) for loading into spreadsheet or other scientific software. For the
vertical profiles, the software allows also to extract data at any interpolated
standard levels (pre-defined, or defined by the user).
The basic data are organised in different directories on the CD-ROM. These
directories correspond to the data types: Adcp, Bottle, Ctd, Current, Net,
Thermi, Trap, Xbt. Other specific data are archived in the directory Others
on the CD-ROM.
All the catalogues, format description, codes, QC processing are available in
the documentation directory in html format.
TMSI/IDM/SISMER/00-017 - février 2000
MTPII-MATER Data Management Final Report
18/22
Fig. 11: Home Page of the MTPII-MATER Database on CD-ROM
TMSI/IDM/SISMER/00-017 - février 2000
MTPII-MATER Data Management Final Report
19/22
6. EVENTS
6.1. Data Management Meetings

Paris, July 3-4, 1997, Institut Océanographique and IFREMER

Rome , 17 November 1997, CNR

Athens, 9-10 March 1998, NCMR/HNODC

Rhodes, 13-14 October 1998, before the III rd MTPII workshop

Paris, 25-26 Mars 1999, Institut Océanographique

Perpignan, 28 October 1999, during the IVth MTPII Workshop
6.2. Participation to Scientific and technical Meetings

Ocean Data Symposium, Dublin, Ireland, 15-18 October 1997
Presentation:
« Distributed Data Management Structures for Scientific Programmes:
the MTPII-MATER Case by Catherine MAILLARD, Beniamino B.
MANCA, Efstathios BALOPOULOS and Jean-François RACAPE.
And a poster

Progress in Oceanography of the Mediterranean Sea, Rome, Italy,
17-19 November 1997
Poster presentation


Lisbon, 23-27 May 1998 : 3rd European Marine Science and
Technology
Conference
poster presentation
MEDAR/MEDATLAS Meeting, Paris, 22 March 1999, presentation :
" MTPII/MATER a wealth of new data for the Mediterranean", Data
Management Group (presented by C. Maillard)
TMSI/IDM/SISMER/00-017 - février 2000
MTPII-MATER Data Management Final Report
20/22
7. Conclusion
The data collected during MTP II-MATER increases significantly the volume
of available data and will be used by scientists and non-scientists many years
after the end of the project. The physical data are practically completely
archived; the biochemical data will continue to be archived during the next six
months. Even in the present state, this database represents a major result of the
project. The catalogues, protocols and user interfaces available on the web
contribute to the project visibility and efforts made to improve the internal
communication with the scientists
The regional subdivision of the main data management task has facilitated the
communication, enhanced the data flow between scientists and data managers
and contributed to the constitution of a high quality data set. Even if the gap
between the data collection and archiving, which exists in all the projects,
could have been shorter, the percentage of data safely archived, especially for
the basic parameters is quite high and the remaining bio-chemical parameters
well underway to be the same. The data management has contributed to make
available these data in a standardised coherent level of quality, and organised
in a way easy to handle by different kind of users.
The quality assurance has been a high point of the data management. It has
been insured both by maintaining close relationship with the source scientists,
and by implementing standardised procedures for formatting and qualifying the
data, using inter-calibrated expert software tools available at the Regional Data
Centres. The recognised quality of the data collected in the frame MTPIIMATER should enhance their international impact.
It should also be underlined that the existing protocol for handling
Mediterranean data has been considerably extended and improved from data
management workshops, and this represents another benefit MTPII-MATER.
The protocol is also available for new experiments in the Mediterranean,
especially in the perspective of the operational oceanography.
In conclusion, it was clear that challenging difficulties existed for the data
management of such a large project, due to the number of source laboratories,
and compatibility between the existing data management systems of each
regional data centre. It should be underlined that they have been solved and the
standardisation of the procedures for data handling accelerated. This would not
have been made within smaller distinct projects. Moreover, the integration of
the archiving decreases considerably the danger of lost of data, which is
frequent when data remain dispersed with different format and units, either in
the source laboratories, or on various media. The MTPII-MATER data
management structure thus permitted to assemble a qualitatively and
quantitatively very good final product, whose copies are safely safeguarded in
three operational data centres which maintain the data on up to date archiving
supports. In addition to the data product on CD-ROM, these data will remain
TMSI/IDM/SISMER/00-017 - février 2000
MTPII-MATER Data Management Final Report
21/22
advertised for in on line catalogues of the data centres and disseminated for
further exploitation in accordance to the MAST policy.
ACKNOWLEDGEMENTS
This work has been done in the frame of the MAST MTP II-MATER project
(MAS3-CT96-0051). We acknowledge with thanks the financial support of the
EC, IFREMER, NCMR, and OGS.
8. References
(1) MAST & UNESCO/IOC, 1993. Manual of Quality Control Procedures for
Validation of Oceanographic Data. Unesco Manual and Guides 26.436 p.
(2) Maillard C., Balopoulos, E, Fichaut M., Giorgetti A., Iona A., Latrouite A.,
Manca B., Nicolas P., 1999.
Mater Data Manual V3
Vol 1: General Description and inventories, V3, SISMER/1997/IS002, 37 pp +
forms.
Vol 2: Parameter Inventory, SISMER/1997/IS003, 51 pp.
(3) MEDATLAS Group. MEDATLAS 1997 Database and climatological atlas
of temperature and salinity. 3 Cdroms & WWW server , IFREMER Ed.
TMSI/IDM/SISMER/00-017 - février 2000
MTPII-MATER Data Management Final Report
22/22
9. Annexes - Regional Data Management Reports
Eastern Mediterranean Data Archiving
Central Basin Data Management
Western Basin Data Management
TMSI/IDM/SISMER/00-017 - février 2000
Download