MTP II- MATER Data Management 1996-1999 (MAS3-CT96-0051) Final Report C. MAILLARD1, M. FICHAUT1, A. GIORGETTI3, E. BALOPOULOS2, S. IONA2, A. LATROUITE1, B. MANCA3, P. NICOLAS4 and J-A. SANCHEZ-CABEZA5 1 IFREMER/SISMER, BP 70, 29280 Plouzané, France Hellinikon, 16604 Athens, Greece 3OGS, PO Box 2011, 34016 Trieste, Italy. 4SAFEGE/CETIIS, 30 Av. Malacrida, 13100 Aix en Provence, France 5 Universitat Autònoma de Barcelona,08193 Bellaterra,Spain. 2NCMR/HNODC, TMSI/IDM/SISMER/006017 - février 2000 MTPII-MATER Data Management Final Report 2/22 CONTENT 1. Introduction .............................................................................. 3 1.1. Summary of the Objectives and Methodology............................................. 3 1.2. Data Management Structure and role ......................................................... 4 1.3. Data and Meta-data Circulation ................................................................... 5 2. Meta-data Management - WWW Catalogues ......................... 6 3. Data sets Archived .................................................................. 8 3.1. Physics ........................................................................................................ 8 3.2. Biochemistry ................................................................................................ 8 3.3. Other specific parameters ........................................................................... 8 4. Quality Assurance ................................................................... 13 4.1. Definition of Data Protocol ........................................................................... 13 4.2. Implementation : Data Formatting and Qualification ................................... 14 4.3. Data Experts Review ................................................................................... 16 5. MTPII-MATER Database on CD-ROM ..................................... 17 6. EVENTS .................................................................................... 19 6.1. Data Management Meetings ....................................................................... 19 6.2. Participation to Scientific and technical Meetings ....................................... 19 7. Conclusion ............................................................................... 20 8. References ............................................................................... 21 9. Annexes - Regional Data Management Reports ................... 22 TMSI/IDM/SISMER/00-017 - février 2000 1.1. MTPII-MATER Data Management Final Report 3/22 1. Introduction 1.1. Summary of the Objectives and Methodology The objective of the Data Management Workpackage, was to insure that the precious multidisciplinary data sets collected during the 105 sea cruises (about 1000 days at sea) and on the 126 mooring lines from 1996 to 1999, would be easily exchanged among the participants, safeguarded for further use and published on the best media available at the time of the project. As currently made during major international projects, an operational data management structure has been defined to carry out the corresponding tasks. It has been a regionally distributed structure. A preliminary task has been to define a common protocol for data formatting and quality checking, so that data of the same type collected by different teams, would be coherent and comparable, whatever their sources. This protocol has been improved all along the project implementation. The quality assurance and the safeguarding are improved by archiving the data as soon as possible after the data collection and scientific validation. The methodology for data tracking and for fastening the circulation of information and data has been to maintain catalogues by regular contacts with the scientists and to publish them on internet in a standardised form at each regional data centre. The data have been organised in "basic parameters" which are parameters useful for all disciplines, and other "specific parameters", for which no any agreed standards exist. Both are archived, but only the basic parameters are submitted to the full quality assurance protocol. To meet a wide potential range of users, a prototype of the Cdrom has been prepared and demonstrated in the Perpignan final workshop for first evaluation; it is joint to this report. TMSI/IDM/SISMER/00-017 - février 2000 MTPII-MATER Data Management Final Report 4/22 1.2. Data Management Structure and role The data management structure, schematised in Fig. 1, includes three regional archiving centres and two operational groups. Regional DC W Basin RDC Adriatic RDC Ionian E Basin Animation Task Data Manager Quality Experts Committee Project Co-ordinator Fig 1 : MTPII-MATER Data Management Structure The three regional centers are : National Center for Marine Research, HNODC, Greece (Eastern Basin) Osservatorio Geofisico Sperimentale, Italy (Adriatic/Ionian Basin) IFREMER/SISMER, France (Western Basin and co-ordinator data manager) They had the tasks to 1. Develop the common protocol 2. Compile the meta-data and disseminate on WWW 3. Request copies of the data from the source laboratories, and process them for archiving in conformity with the data management protocol. The Animation Task (CETIIS, France) maintained the cruise schedule and on line synthesis of the data status for the project management. The Data Quality Group supervises the validating methods, issuing the appropriate qualification for each data set. TMSI/IDM/SISMER/00-017 - février 2000 MTPII-MATER Data Management Final Report 5/22 1.3. Data and Meta-data Circulation The circulation of information (meta-data) and data (illustrated in Fig. 2) within the data management structure was the following : 1. Search for cruise schedule, both from project and national authorities 2. Request meta-data : summary reports for cruises (ROSCOP), mooring, instrument, data sets (EDMED) by sending standardised forms 3. Request data from these reports, reformat, safeguard, check for quality 4. Publish up to date catalogues of data and meta-data on WWW servers 5. Disseminate data and meta-data according to the project policy. Fig 2 : CIRCULATION OF DATA & INFORMATION DURING MTP II-MATER Rectangles : organism or person Ellipses : Deliverables - Services - Products TMSI/IDM/SISMER/00-017 - février 2000 MTPII-MATER Data Management Final Report 6/22 2. Meta-data Management - WWW Catalogues The cruises, moorings, instruments and data sets summary reports have been archived and made available, without any restriction on the data management WWW servers At any time it has then be possible to get a complete visibility of the fieldwork and the data sets collected. These web servers are : Western Basin : http://www.ifremer.fr/sismer/program/mater/ Adriatic/Ionian : http://doga.ogs.trieste.it/mater/ Eastern Basin: http://hnodc.ncmr.ariadne-t.gr/programmes/mater/ and synthesis made by the Animation: http://bali.cetiis.fr/mtp/mater/. On each of these web sites, a common data management home page has been developed, with access to all the standardised cruise, mooring, instrument and data reports, and links to the other centres and project web sites (Fig. 3). Fig 3: Data Management home page of a Regional Data Centre Clicking on “Cruises Summaries” returns the cruises listed by year, and clicking on any of the cruise returns the corresponding report including ship tracks, list of collected data, list of archived data (with location). The 105 sea cruises reports can be downloaded. Clicking on "Moorings Summaries " returns a list linked with similar reports as for cruises reports (126 mooring reports). TMSI/IDM/SISMER/00-017 - février 2000 MTPII-MATER Data Management Final Report 7/22 Clicking on "Instruments Summaries " returns a list of the main instruments used during the project, the laboratories and information on the sensors calibrations. Clicking on "Data sets Summaries " returns the list of the three main groups in which the data sets have been organised to encompass the difficulties due to the complexity and overlapping of the various disciplines: Physics Bio-chemistry Specific parameters (miscellaneous non standard parameters) These groups are themselves sub-divided into sub-groups (Fig. 4) depending on the method/sensor type (ex: CTD, Lagrangian floats, current meter time series for physics) or the compartment (ex: dissolved, particulate, settling particles, sediment, pore water for chemicals). Fig 4: WWW page of the data sets catalogue for Central Basin The first two groups correspond to basic parameters. Clicking on any item returns the corresponding list of cruises/moorings lists where these data have been collected and the data location. TMSI/IDM/SISMER/00-017 - février 2000 MTPII-MATER Data Management Final Report 8/22 3. Data sets Archived The physical and biochemical basic parameters data have been reformatted at a unique common format (MEDATLAS) and full quality assurance procedure have been applied before archiving. The other specific parameters have been only safeguarded. A brief synthesis of the archived data is presented here below, in the state available for the Perpignan meeting. In fact some more data are still in process of archiving and it does not represent the final content of the database. 3.1. Physics VERTICAL PROFILES TIME SERIES 3013 CTD CTD TIME SERIE 572 XBT 21 THERMISTOR STRING 461 ADCP VERTICAL PROFILES 110 CURRENT METER LAGRANGIAN FLOAT The positions of the observations are reported on Fig. 5 for the vertical profiles, Fig.6 for the time series on fixed moorings and Fig. 7 for Lagrangian time series. 3.2. Biochemistry VERTICAL PROFILES TIME SERIES 1473 BOTTLES STATIONS 37 SEDIMENT TRAPS 269 BIOLOGICAL STATIONS The positions are reported on Fig. 8. 3.3. Other specific parameters Additional or specific data (meteorological, biological..), have been only marginally archived, without any reformatting or quality check. They are available only at the original source format of the scientific file, and can be extracted by cruise file only. The positions are not reported. TMSI/IDM/SISMER/00-017 - février 2000 MTPII-MATER Data Management Final Report 9/22 Fig. 5: Vertical Profiles of Physical Observations - Stations TMSI/IDM/SISMER/00-017 - février 2000 MTPII-MATER Data Management Final Report 10/22 Fig. 6: Time series of Physical Observations - Moorings TMSI/IDM/SISMER/00-017 - février 2000 MTPII-MATER Data Management Final Report 11/22 Fig. 7: Lagrangian Time series of Current TMSI/IDM/SISMER/00-017 - février 2000 MTPII-MATER Data Management Final Report 12/22 Fig. 8: Biochemical Data in station and sediment traps TMSI/IDM/SISMER/00-017 - février 2000 MTPII-MATER Data Management Final Report 13/22 4. Quality Assurance Quality assurance (QA) has been a high point of the MTPII-MATER data management. The data have been first scientifically validated in the scientific laboratories; then copies are transmitted to the archiving centre where they are reformatted at a unique common format (extended MEDATLAS), checked for quality (QC) and safeguarded. Even if the validation is under the responsibility of the scientists, these last QC made before final archiving, allowed to crosscheck the data and insure that no errors have been introduced during the data reformatting. This procedure is in accordance with the recommendation of the international organisation like UNESCO/IOC and MAST and the practises of the other major international projects like WOCE and JGOFS. Basically, QA procedure included 3 tasks: definition of a common protocol for formatting and QC in accordance to the international standards implementation of the protocol for archiving basis parameters validation of the procedures by a Data Quality Expert group. In each regional data centre, it was very important to follow standardised procedures of quality assurance to prepare coherent data sets that can be used by the whole community. Several hundred of parameters have been measured within MTPII-MATER, among them 105 basic parameters of physics and biochemistry were to be shared by the participants and later on, by other users. The quality assurance procedure is focused on these parameters. For the other parameters, resulting from new sensors or methodologies, or for which international agreed standards did not exist, the QA procedure was not applicable. 4.1. Definition of Data Protocol The MTPII-MATER protocol (1) to handle the information and the data, is based on the international recommendations of UNESCO/IOC and MAST (2) and the previous MAST/MEDATLAS protocol (3) which has been developed to process all the basic parameters, vertical profiles and time series. This protocol includes: 1. A data dictionary where the parameter names, units and codes have been standardised. The International System (IS) was used for units, and rules for derived units were recalled, based on ISO standards guidelines. 2. The description of the format: autodescriptive ASCII, including short cruise/mooring header with reference to the author, station header, columns TMSI/IDM/SISMER/00-017 - février 2000 MTPII-MATER Data Management Final Report 14/22 of observations referred to pressure for vertical profiles (depth in the sediments) and time for the times series. 3. The description of the final quality checks (QC) performed on the basic parameters before archiving. QCs includes automatic and visual checks : QC-0: check the format, units, codes and overall completeness and consistency of information QC-1 : check the date and location QC-2 : data points : minimum/maximum broad range values, comparison with climatological statistics (when available), search for spikes, stuck sensor, vertical instabilities etc. The visual checks give the overall consistencies of the data within the same data set, find out the wrong value in case of vertical instability, validate the climatological test in some areas etc. The results are quality flags added to each numerical value. 4.2. Implementation : Data Formatting and Qualification The data received at the data centres are reformatted at the common format, and if necessary, the source scientist is contacted to complete the information. In case of units problems, conversion to IS units were made, when possible to insure comparability with the other data set. The QCs have been performed on the reformatted data files in the three MATER data centres. Hellenic and French data centres used an expert software (SCOOP), and Italian Data Centre, a local software. Examples of QC1 and QC2 checks from SCOOP are displayed in Fig. 9 and 10. After control that the outliers are not artefacts due to formatting, the results of the QC are communicated to the responsible scientists, to take further actions like validation, correction or elimination if necessary. Close co-operation with the scientists who collected the data has widely contributed to the quality assurance. TMSI/IDM/SISMER/00-017 - février 2000 MTPII-MATER Data Management Final Report 15/22 Fig. 9: Check of the Location and Date of the Observations (QC1) Fig. 10: Check of the Location and Date of the Observations (QC2) (in green the current profile, in blue the corresponding climatological profile) TMSI/IDM/SISMER/00-017 - février 2000 MTPII-MATER Data Management Final Report 16/22 4.3. Data Experts Review A Data Quality Group of Experts (DQE) has been created by the project coordinator and its first meeting was held in Rome (CNRS) 17-11-1997, during the International Conference “Progress in Oceanography of the Mediterranean Sea”. It was composed of scientists and data managers in the following fields of expertise: Suspended Particulate Matter : Project Co-ordinator Inorganic chemistry (metals, CFC and radionuclides): Joan-Albert Sanchez-Cabeza (DQE Leader). Physical oceanography, satellite data: MATER Data Managers. Biology and biochemistry of the water column (nutrients, biogenic compounds, primary production, microbiology, fauna): Paul Wassmann. Biology and biochemistry of the benthic zone (nutrients, biogenic compounds, primary production, microbiology, fauna): Roberto Danovaro. Three main tasks were assigned to DQE: 1. The first and most basic role given to the DQG was to transmit to all scientists the need of Data Quality control. All laboratories should observe adequate internal Data Quality Assurance, which includes good sampling and analytical protocols, good laboratory practice (blanks, replicates, proper recording and other aspects) and adequate reporting to the Data Manager. All laboratories should participate to their maximum capabilities, or even organise, inter-comparison exercises, and the results of these exercises should be made available to the MATER community. 2. Another important item was to select the list of "Basic Parameters" to be used by most of the partners, and specifically requested by the modellers. It was agreed that the Steering Committee should provide the DQG a limited list of basic parameters. For these parameters, it was desirable to include a suggested methodology in the MTP Quality Assurance Manual, to organise the inter-comparison exercise, and to implement the full data management protocol before archiving and dissemination. For the other parameters: because of the complexity and cost foreseen, in particular regarding some organic chemistry and biological parameters, the archiving would be limited to safeguarding. 2. Then the Data Quality Group intended to establish a reviewing procedure for data sets, in collaboration with expert scientists. Depending on the data type, the subsets of the data sets should be sent to one of the following Data Quality Experts: TMSI/IDM/SISMER/00-017 - février 2000 MTPII-MATER Data Management Final Report 17/22 From this, will issue a simple Data Quality Statement such as: Data Set is OK More information is needed (specify) Data set must be re-evaluated. Due to its relatively late settlement, the lack of funding, the belated data submissions etc.. DQE did not have real possibility neither for conducting inter-comparison nor for reviewing data sets out of the operational QC procedure applied in the Data Centres. However they have been very useful advisers for the crucial points: 1. Selection of the list of basic parameters 2. Validation of the data dictionary for micro-biology and radioisotopes 3. Check the control values for the QC protocol. 5. MTPII-MATER Database on CD-ROM To facilitate the access to data, a MTP II-MATER CD-ROM will be published. A beta-test CD-ROM (Fig. 11) has been presented at the Perpignan workshop (October 1999) with the complete inventories, subsets of the database, extraction software, and documentation. An integrated user-friendly SELMATER software allows selection, extraction, and visualisation of the observations. The data selection can be done according to several criteria: geographical area, data type (vertical profiles or time series), cruise name or reference, time period (year, month, date), source country, ship, measured parameters and quality flags. Two output formats are available: MEDATLAS and CSV tables (with limited information on data) for loading into spreadsheet or other scientific software. For the vertical profiles, the software allows also to extract data at any interpolated standard levels (pre-defined, or defined by the user). The basic data are organised in different directories on the CD-ROM. These directories correspond to the data types: Adcp, Bottle, Ctd, Current, Net, Thermi, Trap, Xbt. Other specific data are archived in the directory Others on the CD-ROM. All the catalogues, format description, codes, QC processing are available in the documentation directory in html format. TMSI/IDM/SISMER/00-017 - février 2000 MTPII-MATER Data Management Final Report 18/22 Fig. 11: Home Page of the MTPII-MATER Database on CD-ROM TMSI/IDM/SISMER/00-017 - février 2000 MTPII-MATER Data Management Final Report 19/22 6. EVENTS 6.1. Data Management Meetings Paris, July 3-4, 1997, Institut Océanographique and IFREMER Rome , 17 November 1997, CNR Athens, 9-10 March 1998, NCMR/HNODC Rhodes, 13-14 October 1998, before the III rd MTPII workshop Paris, 25-26 Mars 1999, Institut Océanographique Perpignan, 28 October 1999, during the IVth MTPII Workshop 6.2. Participation to Scientific and technical Meetings Ocean Data Symposium, Dublin, Ireland, 15-18 October 1997 Presentation: « Distributed Data Management Structures for Scientific Programmes: the MTPII-MATER Case by Catherine MAILLARD, Beniamino B. MANCA, Efstathios BALOPOULOS and Jean-François RACAPE. And a poster Progress in Oceanography of the Mediterranean Sea, Rome, Italy, 17-19 November 1997 Poster presentation Lisbon, 23-27 May 1998 : 3rd European Marine Science and Technology Conference poster presentation MEDAR/MEDATLAS Meeting, Paris, 22 March 1999, presentation : " MTPII/MATER a wealth of new data for the Mediterranean", Data Management Group (presented by C. Maillard) TMSI/IDM/SISMER/00-017 - février 2000 MTPII-MATER Data Management Final Report 20/22 7. Conclusion The data collected during MTP II-MATER increases significantly the volume of available data and will be used by scientists and non-scientists many years after the end of the project. The physical data are practically completely archived; the biochemical data will continue to be archived during the next six months. Even in the present state, this database represents a major result of the project. The catalogues, protocols and user interfaces available on the web contribute to the project visibility and efforts made to improve the internal communication with the scientists The regional subdivision of the main data management task has facilitated the communication, enhanced the data flow between scientists and data managers and contributed to the constitution of a high quality data set. Even if the gap between the data collection and archiving, which exists in all the projects, could have been shorter, the percentage of data safely archived, especially for the basic parameters is quite high and the remaining bio-chemical parameters well underway to be the same. The data management has contributed to make available these data in a standardised coherent level of quality, and organised in a way easy to handle by different kind of users. The quality assurance has been a high point of the data management. It has been insured both by maintaining close relationship with the source scientists, and by implementing standardised procedures for formatting and qualifying the data, using inter-calibrated expert software tools available at the Regional Data Centres. The recognised quality of the data collected in the frame MTPIIMATER should enhance their international impact. It should also be underlined that the existing protocol for handling Mediterranean data has been considerably extended and improved from data management workshops, and this represents another benefit MTPII-MATER. The protocol is also available for new experiments in the Mediterranean, especially in the perspective of the operational oceanography. In conclusion, it was clear that challenging difficulties existed for the data management of such a large project, due to the number of source laboratories, and compatibility between the existing data management systems of each regional data centre. It should be underlined that they have been solved and the standardisation of the procedures for data handling accelerated. This would not have been made within smaller distinct projects. Moreover, the integration of the archiving decreases considerably the danger of lost of data, which is frequent when data remain dispersed with different format and units, either in the source laboratories, or on various media. The MTPII-MATER data management structure thus permitted to assemble a qualitatively and quantitatively very good final product, whose copies are safely safeguarded in three operational data centres which maintain the data on up to date archiving supports. In addition to the data product on CD-ROM, these data will remain TMSI/IDM/SISMER/00-017 - février 2000 MTPII-MATER Data Management Final Report 21/22 advertised for in on line catalogues of the data centres and disseminated for further exploitation in accordance to the MAST policy. ACKNOWLEDGEMENTS This work has been done in the frame of the MAST MTP II-MATER project (MAS3-CT96-0051). We acknowledge with thanks the financial support of the EC, IFREMER, NCMR, and OGS. 8. References (1) MAST & UNESCO/IOC, 1993. Manual of Quality Control Procedures for Validation of Oceanographic Data. Unesco Manual and Guides 26.436 p. (2) Maillard C., Balopoulos, E, Fichaut M., Giorgetti A., Iona A., Latrouite A., Manca B., Nicolas P., 1999. Mater Data Manual V3 Vol 1: General Description and inventories, V3, SISMER/1997/IS002, 37 pp + forms. Vol 2: Parameter Inventory, SISMER/1997/IS003, 51 pp. (3) MEDATLAS Group. MEDATLAS 1997 Database and climatological atlas of temperature and salinity. 3 Cdroms & WWW server , IFREMER Ed. TMSI/IDM/SISMER/00-017 - février 2000 MTPII-MATER Data Management Final Report 22/22 9. Annexes - Regional Data Management Reports Eastern Mediterranean Data Archiving Central Basin Data Management Western Basin Data Management TMSI/IDM/SISMER/00-017 - février 2000