BODC data set documentation template to accompany environmental data submissions for DOI attribution Data set title: Name of data file (or zip folder containing files in the case of 1 data series per file e.g. CTD or optics profiles – please include rational behind individual file naming): Data set creator(s) and institute(s): Data set period: yyyy-mm-dd to yyyy-mm-dd Data set abstract: Brief explanation of the data set. Sampling methodology and description of analytical techniques: Outline of sampling protocols and analytical techniques used with references if appropriate. Data quality comments: An over view of the data set quality. Can simply be a sentence “The author does not have any concerns over the quality of the data set and values are consistent with similar data reported in the wider literature” to something more in depth. What ever you feel is relevant to the data in the submission. Guidance Notes for providing data The key thing to aim for is that the data set and documentation should allow the data set to stand on its own and be useable without returning to the data set author(s). Data file header contents In order for all information to be captured and explained, so that data users should not necessarily need to return to the data author with simple queries, the following information should be included with the data. A simple way to do this is for it to be incorporated into a header in the data file (see BODC_data_submission_template.csv). The fields suggested below are a suggested minimum, if there are further information relevant to your data set that you feel are not covered then please add them to the header or the associated documentation. Descriptions are given for each entry suggested below: Author(s): Institute(s): To which the dataset will be attributed e.g. PML, MBA, SAHFOS Contact email: Data collected by: List of people involved in collection, analysis, production of the data set beyond those listed as authors. Project PI: Project: Funding Body: Dataset title: Parameter Names (if shortened versions used in column titles): Parameter Units: Description of parameter flags in column QF (if used): Absent data value: Dataset version: Submission date: Associated descriptive document filename: Description of any short hand data column titles e.g. Temp = Temperature of the water column measured by a CTD; Sal = Salinity of the water column measured by a CTD; … etc. If there are a large number of columns a separate table in the documentation may be more user friendly (see below for an example). Description of any short hand units e.g. Temp = degrees Celsius (ITS-90); Sal = dimensionless; … etc. If there are a large number of columns a separate table in the documentation may be more user friendly (see below for an example). If established data flags are used quote what source e.g. WOCE (http://woce.nodc.noaa.gov/woce_v3/wocedata_1/woceuot/document/qcflags.htm), BODC (http://www.bodc.ac.uk/data/codes_and_formats/request_format/), etc… . If using your own system provide a description here (e.g. < = less than detection limit; M = suspect data; T = interpolated data etc). Rather than leaving cells blank, which can leave doubt over whether this was intentional or an oversight, it is sensible for an absent data value to be used. Additionally a flag (e.g. "N") can be used in connection with an empty cell. Please take care when selecting an absent data value, it should be a value that is not likely for a parameter (-999 is a good example). If submitting for a DOI should be "FINAL". Nothing to stop use of this template for your own day to day data management and this field can help you keep track of the version e.g. "Preliminary", "Pre-QC" as you chose. Date sent to BODC Name of associated document containing abstract, sampling protocols and overall comment on data quality. Metadata columns For illustration the metadata columns are highlighted orange in the template spreadsheet. Already in the sheet is the suggested minimum there should be to accompany marine data: Date/time (in GMT/UT) - if no accurate time is available add an adjacent column and flag ‘time approx’, Latitude (decimalized +ve = North) - if no accurate lat is available add an adjacent column and flag ‘lat approx’, Longitude (decimalized +ve = East) - if no accurate lon is available add an adjacent column and flag ‘lon approx’, Sample depth (m), Vessel the samples were collected on (or cruise ID), Sampling gear used (e.g. CTD rosette, Niskin, Bucket, Surface Pump, Nets, Grab, Corer, etc), Site name, if a recognised site is being sampled. Depending on the sampling gear used to collect the samples the metadata columns may need to include more options (see below for additional fields for each sample gear). Where this information is the same for all the data represented in the data set it only needs to be included once in the documentation (e.g. “The data set is from the analysis of a repeated series of vertical hauls at 1 m/s from 50 m to the surface using a 200 um mesh bongo net, 0.5m diameter, opening area ~0.196 m^2, filtering ~9.82 m^3 of water on each haul”). Bucket, Pump, Niskin, Corer, Grab Minimum above CTD rosette Minimum above plus Rosette bottle Firing Sequence Nets/trawl Minimum above plus Mesh size Opening area Haul type if available if available e.g. Vertical, horizontal, oblique For a vertical haul For a horizontal trawl Minimum depth Maximum depth Start time, lat & lon End time, lat & lon Depth, ‘surface’ or ‘benthic’ Haul/trawl speed Volume filtered Where incubations have been carried out, it is suggested details of the sample collection that provided the basis for the incubation should be provided in addition to the following: Incubations Start & end time Treatment Incubation Volume Time Course Point Light, temp conditions and additions (tracers, nutrients, etc) Where a series of samples taking during the incubation at different time steps Data columns These have been highlighted in yellow for illustration in the template. Add additional columns to the right as required. Notes on taxonomic data: Since species nomenclature does change with time it is useful to include World Register of Marine Species IDs (WoRMS AphiaID) (http://marinespecies.org/) and/or Integrated Taxonomic Identification System IDs (ITIS TSN) (http://www.itis.gov/), e.g. Arenicola spp. AphiaID = 129206; ITIS TSN = 67507, along with the taxonomic name. This can either be included as additional columns in the parameter description table in the documentation or as additional rows in the data file: Lat Lon Date and Time AphiaID IT IS TSN Taxa Depth 129206 67507 Arenicola spp. Notes on chemical data: Some chemicals, particularly hyrdocarbon chemicals, can have a number of common names. In such situations the inclusion of the Chemical Abstracts Service Registry Numbers (CASRN) (http://www.cas.org/expertise/cascontent/registry/regsys.html) can be useful in removing ambiguity or confusion. This can either be included as an additional column in the parameter description table in the documentation or as additional rows in the data file: Lat Lon Date and Time CASRN Chemical Depth 7314-30-9 DMSP Example of parameter and unit description table If there are a large number of columns in the data file a separate table in the documentation may be more user friendly (see below for an example). Column title Temp ITIS ID Aphia ID CASRN Sal Density Xmiss PAR Arenicola spp. DMSP 67507 129206 7314-30-9 Units Description Degrees Celsius (ITS90) dimensionless Temperature of the water column measured by a CTD kilograms per cubic metre Percent Density of the water column measured by a CTD Watts per square metre abundance per square metre nanograms per cubic metre Salinity of the water column measured by a CTD Transmissance of the water column measured by a transmissometer Down-welling PAR irradiance in the water column measured by a PAR sensor Abundance of Arenicola spp. Concentration of Dimethylsulfoniopropionate (DMSP) in the water column File names: Make sure file names are descriptive (e.g. WCO_2011_nutrient_dataset.csv) not generic (e.g. Data_for_BODC.csv). Use matching filenames for the documentation and data file, unless the documentation refers to many data files (e.g. in the case of CTD or optics profiles). Other comments: When submitting Excel files please save the files as a comma separated values file (.csv) or tab delimited text file for submission. However take care as coloured text, filled cells and comments boxes are lost when saving from Excel to a csv file. For this reason these formatting methods should never be used for flagging data in a final data set submission. Instead each data column should have a data flag column adjacent to it, with flags applied in this column as text rather than formatting. Additionally data columns should only contain numbers not a mixture of text and numbers (see example below). Nutrient_x (umol/l) 0.04 0.85 0.11 <0.02 should be represented as Nutrient_x (umol/l) 0.04 0.85 0.11 -999 0.02 QF M M N < For data that are below the detection limit of the technique please include the detection limit in the data column with an appropriate flag in the flag column. Also confirm the detection limit in the documentation. Please select the most appropriate level of significant figures for the data you are submitting. Be aware that the formatting of numbers in an Excel spreadsheet will be taken on in the csv file as the absolute number (e.g. 2.456789 formatted to 2dp in an Excel worksheet will only be retained as 2.46 in the csv and if you want a higher level of accuracy reported you should set that appropriately before saving as a csv). Further tips and additional details can be found with the Best Practices for Preparing Environmental Data Sets to Share and Archive (https://daac.ornl.gov/cgi-bin/MDE/S2K/bestprac.html). Please contact me (Email room@bodc.ac.uk; Phone 0151 795 4882) if you have any further questions or if you have a data type that does not fit with the template.