BODC_WCO_submission_documentation_template

advertisement
BODC data set documentation template to accompany environmental data
submissions for DOI attribution
Data set title:
Name of data file (or zip folder containing files in the case of 1 data series per file e.g. CTD or
optics profiles – please include rational behind individual file naming):
Data set creator(s) and institute(s):
Data set period:
yyyy-mm-dd to yyyy-mm-dd
Data set abstract:
Brief explanation of the data set.
Sampling methodology and description of analytical techniques:
Outline of sampling protocols and analytical techniques used with references if appropriate.
Data quality comments:
An over view of the data set quality. Can simply be a sentence “The author does not have any concerns
over the quality of the data set and values are consistent with similar data reported in the wider
literature” to something more in depth. What ever you feel is relevant to the data in the submission.
Guidance Notes for providing data
The key thing to aim for is that the data set and documentation should allow the data set to stand on its
own and be useable without returning to the data set author(s).
Data file header contents
In order for all information to be captured and explained, so that data users should not necessarily need
to return to the data author with simple queries, the following information should be included with the
data. A simple way to do this is for it to be incorporated into a header in the data file (see
BODC_data_submission_template.csv). The fields suggested below are a suggested minimum, if there
are further information relevant to your data set that you feel are not covered then please add them to
the header or the associated documentation. Descriptions are given for each entry suggested below:
Author(s):
Institute(s):
To which the dataset will be attributed
e.g. PML, MBA, SAHFOS
Contact email:
Data collected by:
List of people involved in collection, analysis, production of the data set beyond
those listed as authors.
Project PI:
Project:
Funding Body:
Dataset title:
Parameter Names (if
shortened versions
used in column
titles):
Parameter Units:
Description of
parameter flags in
column QF (if used):
Absent data value:
Dataset version:
Submission date:
Associated
descriptive
document filename:
Description of any short hand data column titles e.g. Temp = Temperature of the
water column measured by a CTD; Sal = Salinity of the water column measured by a
CTD; … etc.
If there are a large number of columns a separate table in the documentation may be
more user friendly (see below for an example).
Description of any short hand units e.g. Temp = degrees Celsius (ITS-90); Sal =
dimensionless; … etc.
If there are a large number of columns a separate table in the documentation may be
more user friendly (see below for an example).
If established data flags are used quote what source e.g. WOCE
(http://woce.nodc.noaa.gov/woce_v3/wocedata_1/woceuot/document/qcflags.htm), BODC
(http://www.bodc.ac.uk/data/codes_and_formats/request_format/), etc… . If using
your own system provide a description here (e.g. < = less than detection limit; M =
suspect data; T = interpolated data etc).
Rather than leaving cells blank, which can leave doubt over whether this was
intentional or an oversight, it is sensible for an absent data value to be used.
Additionally a flag (e.g. "N") can be used in connection with an empty cell. Please
take care when selecting an absent data value, it should be a value that is not likely
for a parameter (-999 is a good example).
If submitting for a DOI should be "FINAL". Nothing to stop use of this template for
your own day to day data management and this field can help you keep track of the
version e.g. "Preliminary", "Pre-QC" as you chose.
Date sent to BODC
Name of associated document containing abstract, sampling protocols and overall
comment on data quality.
Metadata columns
For illustration the metadata columns are highlighted orange in the template spreadsheet. Already in
the sheet is the suggested minimum there should be to accompany marine data:







Date/time (in GMT/UT) - if no accurate time is available add an adjacent column and flag ‘time
approx’,
Latitude (decimalized +ve = North) - if no accurate lat is available add an adjacent column and
flag ‘lat approx’,
Longitude (decimalized +ve = East) - if no accurate lon is available add an adjacent column and
flag ‘lon approx’,
Sample depth (m),
Vessel the samples were collected on (or cruise ID),
Sampling gear used (e.g. CTD rosette, Niskin, Bucket, Surface Pump, Nets, Grab, Corer, etc),
Site name, if a recognised site is being sampled.
Depending on the sampling gear used to collect the samples the metadata columns may need to include
more options (see below for additional fields for each sample gear). Where this information is the same
for all the data represented in the data set it only needs to be included once in the documentation (e.g.
“The data set is from the analysis of a repeated series of vertical hauls at 1 m/s from 50 m to the surface
using a 200 um mesh bongo net, 0.5m diameter, opening area ~0.196 m^2, filtering ~9.82 m^3 of water
on each haul”).
Bucket, Pump,
Niskin, Corer, Grab
Minimum above
CTD rosette
Minimum above plus
Rosette bottle
Firing Sequence
Nets/trawl
Minimum above plus
Mesh size
Opening area
Haul type
if available
if available
e.g. Vertical, horizontal, oblique
For a vertical haul
For a horizontal trawl
Minimum depth
Maximum depth
Start time, lat & lon
End time, lat & lon
Depth, ‘surface’ or ‘benthic’
Haul/trawl speed
Volume filtered
Where incubations have been carried out, it is suggested details of the sample collection that provided
the basis for the incubation should be provided in addition to the following:
Incubations
Start & end time
Treatment
Incubation Volume
Time Course Point
Light, temp conditions and additions (tracers, nutrients, etc)
Where a series of samples taking during the incubation at
different time steps
Data columns
These have been highlighted in yellow for illustration in the template. Add additional columns to the
right as required.
Notes on taxonomic data:
Since species nomenclature does change with time it is useful to include World Register of Marine
Species IDs (WoRMS AphiaID) (http://marinespecies.org/) and/or Integrated Taxonomic Identification
System IDs (ITIS TSN) (http://www.itis.gov/), e.g. Arenicola spp. AphiaID = 129206; ITIS TSN = 67507,
along with the taxonomic name. This can either be included as additional columns in the parameter
description table in the documentation or as additional rows in the data file:
Lat
Lon
Date and Time
AphiaID
IT IS TSN
Taxa
Depth
129206
67507
Arenicola spp.
Notes on chemical data:
Some chemicals, particularly hyrdocarbon chemicals, can have a number of common names. In such
situations the inclusion of the Chemical Abstracts Service Registry Numbers (CASRN)
(http://www.cas.org/expertise/cascontent/registry/regsys.html) can be useful in removing ambiguity or
confusion. This can either be included as an additional column in the parameter description table in the
documentation or as additional rows in the data file:
Lat
Lon
Date and Time
CASRN
Chemical
Depth
7314-30-9
DMSP
Example of parameter and unit description table
If there are a large number of columns in the data file a separate table in the documentation may be
more user friendly (see below for an example).
Column title
Temp
ITIS ID
Aphia ID
CASRN
Sal
Density
Xmiss
PAR
Arenicola
spp.
DMSP
67507
129206
7314-30-9
Units
Description
Degrees
Celsius (ITS90)
dimensionless
Temperature of the water column measured by a CTD
kilograms per
cubic metre
Percent
Density of the water column measured by a CTD
Watts per
square metre
abundance
per square
metre
nanograms
per cubic
metre
Salinity of the water column measured by a CTD
Transmissance of the water column measured by a
transmissometer
Down-welling PAR irradiance in the water column
measured by a PAR sensor
Abundance of Arenicola spp.
Concentration of Dimethylsulfoniopropionate (DMSP)
in the water column
File names:
Make sure file names are descriptive (e.g. WCO_2011_nutrient_dataset.csv) not generic (e.g.
Data_for_BODC.csv). Use matching filenames for the documentation and data file, unless the
documentation refers to many data files (e.g. in the case of CTD or optics profiles).
Other comments:
When submitting Excel files please save the files as a comma separated values file (.csv) or tab delimited
text file for submission. However take care as coloured text, filled cells and comments boxes are lost
when saving from Excel to a csv file. For this reason these formatting methods should never be used for
flagging data in a final data set submission. Instead each data column should have a data flag column
adjacent to it, with flags applied in this column as text rather than formatting. Additionally data columns
should only contain numbers not a mixture of text and numbers (see example below).
Nutrient_x
(umol/l)
0.04
0.85
0.11
<0.02
should be
represented as
Nutrient_x
(umol/l)
0.04
0.85
0.11
-999
0.02
QF
M
M
N
<
For data that are below the detection limit of the technique please include the detection limit in the
data column with an appropriate flag in the flag column. Also confirm the detection limit in the
documentation.
Please select the most appropriate level of significant figures for the data you are submitting. Be aware
that the formatting of numbers in an Excel spreadsheet will be taken on in the csv file as the absolute
number (e.g. 2.456789 formatted to 2dp in an Excel worksheet will only be retained as 2.46 in the csv
and if you want a higher level of accuracy reported you should set that appropriately before saving as a
csv).
Further tips and additional details can be found with the Best Practices for Preparing Environmental
Data Sets to Share and Archive (https://daac.ornl.gov/cgi-bin/MDE/S2K/bestprac.html).
Please contact me (Email room@bodc.ac.uk; Phone 0151 795 4882) if you have any further questions or
if you have a data type that does not fit with the template.
Download