mmx_dce_draft_2014-08-28 - Pacific Northwest Aquatic

advertisement
Monitoring Metadata Exchange (MMX)
Proposed Data Exchange Standard
8/28/2014
Acknowledgements
Many thanks to the individuals on the project team who provided valuable knowledge and input that led to the
development of the data exchange template tables included in this draft: Brodie Cox (Washington Department
of Fish and Wildlife), Van Hare (Pacific States Marine Fisheries Commission), Jack Janisch (Washington
Department of Ecology), Won Kim (Oregon Department of Environmental Quality), Pete Ruhl (US Geological
Survey), Jacquelyn Schei (US Geological Survey/Pacific Northwest Aquatic Monitoring Partnership), Becca Scully
(US Geological Survey/Pacific Northwest Aquatic Monitoring Partnership), Keith Steele (Sitka Technology Group)
and Daniel Wieferich (Michigan State University, National Fish Habitat Partnership).
2
Introduction
Few organizations document location, site visit history, and protocol information in readily available,
shareable and standardized machine readable formats. The result is poor region-wide coordination
between programs. One solution to this issue is to standardize the format in which the ‘who’, ‘what’,
‘when’, ‘where’ and ‘how’ of data collection events is presented to support data sharing efforts. The
focus would not be on sharing raw measurement data, but on metadata about these events.
In support of improved data management and in an effort to collect and publish timely, accurate, and
relevant scientific data that properly inform key management questions, this document proposes a data
exchange standard for site level data collection event (DCE) metadata associated with research,
monitoring, and evaluation (RM&E) efforts. A site is the spatial location where one or more
measurements are taken and metrics derived. A more generic term for this is “spatial unit”.
Furthermore, by establishing a standard for this metadata in an online exchange, the data will become
more valuable as funders and researchers can more easily identify where monitoring is occurring,
where data are located, and what designs and methodologies were used to collect it. Contained in the
document is the first version of a data exchange template for DCE elements. Data exchange templates
(DET), which describe the information participants would like to share and what format it should be
shared in, are a critical step in sharing information, but only one piece of the data sharing puzzle. After
an agreement is reached on what data to share and the format, technical discussions will be needed to
finalize a DET and processes for sharing data, which includes a data sharing agreement. All of these
pieces will make up the standard.
Site level DCE metadata includes the minimum information needed to render integrated maps of
project locations and perform analytical queries relating to location, method, organization, and time
across programs. Site level DCE metadata can help answer the ‘who’, ‘what’, ‘where’, ‘when’, and ‘how’
questions about specific projects. Over time, the expectation is that the more researchers who provide
site level DCE metadata for their RM&E data in an online exchange, the greater universal access will be
to a complete and up‐to‐date summary of the information available. This will enable managers to make
better informed decisions about key management questions and to reduce redundancy or duplication
when planning future efforts.
Monitoring Metadata Exchange (MMX)
There are three general classifications of potential information exchanges using aquatic resource
monitoring metadata and analysis products:
• Site level data collection event (DCE) metadata to answer the Who, What, Where, When, and How
questions
• Site level metrics
• Indicators at various scales
Below you will find the first draft of the Monitoring Metadata Exchange (MMX) data exchange template
(DET) for metadata pertaining to site level DCE. Its purpose is to define and describe data elements and
validation rules. A DCE represents the collection of measurements or observations via one or more
methods of a single protocol at a single site on a specific date or continuously over a date range.
3
Ultimately, the intent of this effort is the online exchange of this information. When data is exchanged,
the source of the data is normally the organization that implements the project and/or stores
measurements and metadata related to the projects. Consumers (sinks) of this information include
state or federal agencies, universities, non‐profits, and the general public. We propose that exchange of
site-level DCE metadata between sources and sinks occur initially through two mechanisms:
 Environmental Protection Agency (EPA) Exchange Network virtual node services
 RESTful web API (local client implementations)
Although the communications protocols and message encoding mechanisms may be different or may
change over time, it is our goal that the basic information model exchanged via these two mechanisms
be identical (Figure 1).
Figure 1. Conceptual data diagram for metadata exchange.
The Exchange Network (http://www.exchangenetwork.net) is an EPA funded standard for flowing
environmental data and for supporting technology implementations that enable organizations that
produce or consume environmental data to exchange information using a standard protocol. Our intent
is to apply for a grant from EPA to implement web services to support virtual sharing of these data on
the Exchange Network.
As important as the Exchange Network is, it does require significant IT infrastructure and skills to
implement and maintain. Many organizations, especially smaller ones, lack adequate funding or skills to
undertake this task and be successful. For this reason, we will also develop a secondary option more
limited in potential reach, but easier and less expensive to operate. This approach will be based on a
4
less complicated RESTful web API over HTTP as a means of exchanging site level DCE metadata between
willing participants.
Implementation
A formal exchange standard implemented on the EPA Exchange Network will benefit many, including
PNAMP partners. The proposed MMX will enable better integration of information from disparate
efforts across time and space by providing data seekers with more than just a location of monitoring
efforts. Any website with map services could use the exchange to improve their content. Bonneville
Power Administration is interested in using the MMX to support their needs in the Columbia River
basin. PNAMP plans to utilize this work to standardize the way information flows into the Monitoring
Explorer (https://www.monitoringresources.org/Sites/Explorer/Index).
5
Data Exchange Template Tables
MMX DET Version History
Date
Version
Description
7/16/2014
1.0
Site level data collection event metadata
For the most recent version of the DET, go to: http://www.pnamp.org/document/4708.
DATA COLLECTION EVENT
The Data Collection Event Class represents a collection of measurements or observations via one or more methods of a single
protocol at a single site, at a specific time or continuously over a time range. The observations may be made my human observers or
collected via remote sensing devices.
Field Name
Provider ID
Field Description
Data Type
Notes, Codes/Conventions
A unique identifier assigned to a particular data
provider or aggregator to identify the source of the
event record.
The organization primarily responsible for
conducting the data collection.
Name or identifier of the database or system where
data originally came from. This should be the
database or system used originally by the collector
rather than the aggregator.
GUID, Required
Text(256), Optional
Recommend use of Repository ID,
which is a PNAMP generated unique
identifier for commonly used data
repositories throughout the region.
Event ID
Provider-generated, unique identifier for the data
collection event.
Text(256), Required
Geographic
Location
Representation of the geospatial extent over which
the observations are collected.
Object, Required
Event Date
For non-continuous measurements, the date and
time (if known) that the observation was made.
Non-continuous measurements are made at
distinct, or separate, points in time.
Text(32), Optional
Must be unique across space and time
within an individual Provider; primary
key of provider dataset.
Can either record the actual location of
a measurement, or alternatively,
describe an extent that the
observation is assumed to represent.
See "Geographic Location" tab.
ISO8601 format, required if Event
Timespan is not present. Prohibited if
Event Timespan is populated.
Collector ID
Source
System ID
Assigned by PNAMP or other
aggregator.
Text(256), Required
6
Event
TimeSpan
For continuous or periodic measurements over a
period of time, this element is a pair of date and
times that bound the time period over which
observations were made at this location.
Continuous or periodic measurements can be made
at an infinite number of points in time between any
two points of time.
If Timespan is specified, the mean frequency of
observations in the Timespan.
Text(64), Optional
Each date is ISO8601 format, required
if Event Date is not present. Prohibited
if Event Date is populated.
Decimal (7,2), Optional
Required if Event Timespan is present
If Periodicity is specified, this is the unit of
measurement for Periodicity, based on the
fundamental unit of time (the second) as defined by
the International System of Units.
Enumeration, Optional
Required if Periodicity is present.
Allowed values:
* Seconds
* Minutes
* Hours
* Days
* Variable
Data
General classification of measurements made by
Classifications this survey. More than one value may be supplied.
Object, Repeating Group (1:N),
Required
See "Data Classification" tab
Program
Name
Name of the Monitoring Program or Project, as
defined by the data provider.
Text(256), Required
Protocol URL
A link to the protocol describing how the data were
collected, preferably documented in Monitoring
Methods.
Name of the Protocol
Text(1024), Optional
A list of data collection methods within the protocol
that were performed within the scope of this event.
Text(1024), Optional
A link to the study design describing how sites were
chosen, preferably documented in Monitoring
Resources.
Designation whether or not a site is part of a
probabilistic study design
Text(1024), Optional
Periodicity
Periodicity
UOM
Protocol
Name
Methods
Design URL
Probabilistic
Design
Consider some sort of url pointer, to
minimize document-not-found errors-urls change
Text(1024), Optional
Enumeration, Optional
Possible Values
* Yes
* No
7
Panel Name
Name of the panel if this site was surveyed as part
of an annual or rotating panel design.
Text(256) Optional
Contacts
The name(s) of responsible party/parties associated
with this collection event.
Object, Repeating Group (0:N),
Optional
See "Contact" tab
Download
URL
A URL that provides direct download of the dataset
containing measurement data for this DCE, if
downloads are supported.
Text(1024), Optional
consider some sort of url pointer, to
minimize document-not-found errors-urls change; consider using DOIs
Notes
Any provider-specific notes regarding this DCE that
may be useful for others searching for data.
Text(1024), Optional
Attributes
Provider-defined named attribute pair(s)
Object, Repeating Group (0:N),
Optional
Last Modified
Date
The date and time that this DCE record was last
modified in the provider system.
Text(32), Required
Restrictions
A narrative of any limitations on the availability of
data from this collection event.
Text(1024), Optional
Used to define 0:N provider-specific
attribute values for searching and
analysis. Can be used to store stream
network covariates, crew information,
etc. See "Attribute" tab.
Along with the Event ID, this field is
used by client nodes to determine if
they have a current copy of this DCE.
ISO8601 format.
CONTACT
The Contact object is used to supply information about persons who are responsible for the collection, maintenance, or
interpretation of data associated with the data collection event. Since contact information is such a widely used element in
metadata, it is anticipated that standard encodings will be used to represent this information in the schema.
Field Name
Contact
Name
Field Description
Data Type
The name of a responsible party.
Text(256), Required
Contact
The name of the organization the party is/was
Organization employed by.
Text(256), Optional
Contact
Email
Text(256), Required
The email address of the responsible party for this
collection event.
Notes, Codes/Conventions
8
Contact Role
The role of the responsible party with respect to this
data collection event.
Contact
Phone
The phone number of the responsible party for this
collection event.
Enumeration, Required
* resourceProvider
* custodian
* owner
* user
* distributor
* originator
* pointOfContact
* principalInvestigator
* processor
* publisher
* author
Text(16), Required
From ISO 91115 CI_RoleCode
Includes area code
9
DATA CLASSIFICATION
The Data Classification object is used to indicate what general type or classification of measurement data was collected during the
data collection event.
Field Name
Field Description
Data Type
Notes, Codes/Conventions
Data Type
General classification of measurements made by this
survey. More than one value may be supplied.
Required,
Enumerated Value
Data Subtype
Subclassification of measurements made by this
survey.
Optional,
Enumerated Value
o Amphibians
o Birds
o Climate/Weather
o Contaminants/Toxics/Pollutants
o Fish
o Hydrology/Water Quantity
o Macroinvertebrates
o Mammals
o Disease/Pathogens/Parasites
o Plankton
o Sediment/Substrate/Soils
o Multi-Species
o Reptiles
o Vegetation/Plants
o Water Quality
o Disturbance/Restoration
o Landscape Form & Geomorphology
o Upland Geomorphology
o Classification of Ecological or Geological
Attribute
o Other
Nested enumerated values within each Data Type
(above).
Need domain table; see subcategory list here:
https://www.monitoringmethods.org/Metric/Index
GEOGRAPHIC LOCATION
The Geographic Location class describes a geospatial extent at which or in which observations are made. Can either represent a
series of measurements taken throughout the extent or, alternatively, can represent the area that a single measurement is intended
to represent. Unless otherwise specified, the horizontal datum is assumed to be NAD83.
10
Field Name
Field Description
Data Type
Notes, Codes/Conventions
Geographic
Location ID
Provider or collector assigned name or identifier for
this location.
Text(256),
Required
Location
Name
Collector or Provider assigned, human recognizable
name of the geographic location
Text(256), Optional
Location
Name Source
Information about the source of this Location
information. Could be a publication (gazetteer),
institution, or team of individuals.
Text(256), Optional
Location
Accuracy
The estimated horizontal accuracy of the points or
vertices comprising the points, lines, or polygons.
Required,
Enumeration
* Less than 1m
* 1m- 5m
* 6m - 10m
* 10m - 20m
* 21m - 50m
* 51m - 100m
* 101m - 500m
* Greater than
500m
* Unknown
* Undisclosed
Discrete
Points
Points must be specified as either geometries or as a X/Y and Spatial Reference System Identifier
Can be a primary key from the provider system or
a manufactured name (e.g. LLID, Reach Code +
measure, etc.)
Reach code, stream name, (LLID) from SN, etc.
If the contents of the location name came from a
hydrography layer, this would be the version of
the layer (e.g. NHD Plus V2)
Point
Describes a single point at which observations were
made
gml:Point, Optional
GML simple features profile, which includes a
spatial reference identifier, is recommended; X/Y
is an option if GML not available
X Coordinate
X coordinate for the point
decimal (19,6),
Optional
If the point is not expressed as a gml geometry,
the X, Y, and SRID fields are required
Spatial
Reference ID
Well known spatial reference ID
Text(16)
e.g. EPSG:4326
11
Lines and
Linestrings
Reaches must be specified either geometries
Line
Represents the linear extent over which
observations were made
Polygons
Areas must be specified as geometries
Polygon
Represents an area over which observations were
made
gml:LineString,
Optional
GML simple features profile, which includes a
spatial reference identifier. Could represent a
reach, transect, thalweg, etc.
gml:Polygon,
Optional
GML simple features profile, which includes a
spatial reference identifier.
ATTRIBUTE
A DCE attribute is a structure that allows providers to associate simple name value pairs to the subject Data Collection Event.
Attributes may be used as "tags" for events for reporting, filtering and sorting purposes, or to simply store attributes that do not
have a defined field in the base Data Collection Event object. The forms and format of name value pairs are provider defined and
may change at the will of the provider.
Field Name
Field Description
Data Type
Notes, Codes/Conventions
Attribute
Name
The provider defined name of the attribute.
Text(64), Required
Within a collection of attributes in a
DataCollectionEventObject, there can be multiple
attributes with the same name to allow for
multivalued structures like lists. Order is not
guaranteed.
Attribute
Value
The value of the attribute
Text(256), Optional
String-encoded value that corresponds to this
attribute instance. A missing value indicates
"nullness".
12
Download