RDA_Project_Update_20150808

advertisement
Adoption of RDA-DFT Terminology and Data Model to the Description and
Structuring of Atmospheric Data
Aaron Addison, Rudolf Husar, Cynthia Hudson-Vitale
Background
•
DataFed & the Air Quality
Community Catalog
Problem Addressed
•
•
Facilitate data interoperability
Extend data discovery to non-domain researchers
RDA Data Foundation and Terminology (DFT)
Data Foundation & Terminology WG
RDA Data Foundation and Terminology (DFT) Adoption plan
●
●
●
Map DFT model to DataFed/AQ Com Cat data model
Assess potential RDA/DFT compliance
Real-world evaluation of outcome
RDA Data Foundation and Terminology (DFT) Adoption Activities & Timeline:
Training
Ongoing
Draft DataFed data model and inventory terms and evaluate existence of PID’s
March
Virtual Server
March
Mirror site of AQComCat
March
User testing
March
Compare DFT model to DataFed data model
March/April
Create/assign PID’s to AQComCat
April
Reboot AQCom Cat
May
Add new datasource to AQComCat - test understandability of terms with data suppliers.
Conduct post-DFT implementation usability of AQComCat
Publish paper/report on findings
June
July
August
RDA Data Foundation and Terminology (DFT) Adoption Outcomes
●
●
●
●
Report on usability of RDA DFT model
Assess fit of the RDA DFT model to DataFed data model
Evaluate improved discoverability/reuse
Engage with Data Foundation and Terminology Working Group
Thank you!
Current
Catalog
RDA - Project
Adopt, Refine RDU
products
DTR
DTR
DTR
RDACompliant
Catalog
Interaction with RDA Groups
DTR Data Type Registries WG (Register Types)
PID Information Types WG (Get PID ??)
DFT Data Foundation and Terminology WG (Data Model…?)
DF
Data Fabric IG (DataFed Use Case?)
Data Type Registry for Sharing and Reuse
We will use the RDA Type registry (L. Lannom)
Registry needs to be federated… e.g. with GCMD registry
What is Data Typing?
Data ‘typing’ is the characterization of data structure, contexts, assumptions and other info
needed to describe and understand the data.
The ‘types’ need to be:
•Defined and understood by data producers and consumers
•Types should have multiple levels/granularity –single observation to data sets..(how??)
•Each type is to have a PID
•Permanently associated with the data they describe
•Standardized (OGC, ISO), unique (PID), and discoverable (TypeRegistry?)
Data typing should aid the discovery, understanding, sharing and reuse of data.. across
domains
•Automated processing of large data collections is a necessity
•Which requires a machine readable types, i.e. a clear data model for typing (clarify ???)
•‘Composability’: lower level/base types can form more complex composite types (how???)
Global Change Master Directory (GCMD)
Extensive collection of keywords and UUID’s; Possible use for ‘Types’
Do we combine it with AQ ComCat Types? Any other registries to federate?
The GCMD/IDN release Version 8.1 of the GCMD/IDN Science Keywords. RESTFul service (API),
is also available. Keyword List:
Science and Services Keywords: Category, Topic, Term, Variable, Level, Detailed, Variable, UUID
Other ‘Types’ (Some are useful – to be defined formally, ID-d, in RDU Type registry : Data Centers,
Projects, Instruments, Platforms, Locations, Horizontal Resolution, Vertical Resolution, Temporal Resolution,
URL Content Types
Project Outputs and Outcomes, Next Steps
Outputs:
•Develop a data model for suitable for describing atmospheric data
•Identify basic and composite types for atmospheric data
•Register these types in DTR
•Attach ‘types’ to data in DataFed
•Type-based search interface to DataFed data.
Outcomes
•Real-world testing of Typing concepts and Registry
•Understanding of domain-specific issues and approaches, lessons learned
•Interaction with multiple RDU Groups … contribution to Data Fabric
•Recommendations for next phase
Next steps outlined ???
ToDo’ s
Combine AQComCat, GCMD, Other ‘keywords’/facets/
Formally define ‘RDU Types – Names, descriptions’, Get PIDs
Check, reconcile types with concepts of DTR, PID, DFT WGs – is it OK?
Register Types in Type Registry
Incorporate Type-based metadata into AQComCat
Test catalog usability before, after
GCMD
Science &
Services
GCMD Temporal
Resolutions
Climate &
Forecast
Conventions
GCMD Platform
GCMD
Instruments
ToDo’s
•
•
•
See what facets are available on RDA DTR
• Space/time domains
Map a few GCMD and CF Standard Names to RDA DTR – see how it would work
Understand where/how the DTR fits into the XML schema
• It is schema based? How are people linking the DTR and the data type in the metadata?
• Validator for DTR?
• How is the DTR machine readable?
RDA: Data Type Registry
Potentially useful types for AQComCat:
•
Date
•
Sensor Data
•
Spatial Bounding Box
•
GPS Coordinate
CGMD Terms
Aerosol Terms:
"EARTH SCIENCE","ATMOSPHERE","AEROSOLS","","","","","2e5a401b-1507-4f57-82b8-36557c13b154"
"EARTH SCIENCE","ATMOSPHERE”,"AEROSOLS","AEROSOL BACKSCATTER","","","","f795b88f-1aba-4548-97f6-7b587e8ba451”
"EARTH SCIENCE","ATMOSPHERE","AEROSOLS","AEROSOL EXTINCTION","","","","40633fe2-5b32-4bdc-a17b-b1cfebc01ae7"
EARTH SCIENCE","ATMOSPHERE","AEROSOLS","AEROSOL FORWARD SCATTER","","","","449e2e03-8efd-42b6-8152-3602e4bab21d”
"EARTH SCIENCE","ATMOSPHERE","AEROSOLS","AEROSOL OPTICAL DEPTH/THICKNESS","","","","61c3b720-abc8-4430-866cf1da35d2cd0b"
"EARTH SCIENCE","ATMOSPHERE","AEROSOLS","AEROSOL OPTICAL DEPTH/THICKNESS","ANGSTROM
EXPONENT","","","6e7306a1-79a5-482e-b646-74b75a1eaa48”
"EARTH SCIENCE","ATMOSPHERE","AEROSOLS","AEROSOL PARTICLE PROPERTIES","","","","02ea239e-4bca-4fda-ab87be12c723c30a”
"EARTH SCIENCE","ATMOSPHERE","AEROSOLS","AEROSOL RADIANCE","","","","7db9eab3-4c7a-4471-a826-a306f178ad3e"
"EARTH SCIENCE","ATMOSPHERE","AEROSOLS","CARBONACEOUS AEROSOLS","","","","527f637c-aea5-4519-9293-d57e10a76bff"
"EARTH SCIENCE","ATMOSPHERE","AEROSOLS","CLOUD CONDENSATION NUCLEI","","","","27478148-b4b6-4c89-8829-08d2ee7bfe10"
"EARTH SCIENCE","ATMOSPHERE","AEROSOLS","DUST/ASH/SMOKE","","","","1b6342c6-315b-4f4f-b4e3-d6902aaa3e85"
"EARTH SCIENCE","ATMOSPHERE","AEROSOLS","NITRATE PARTICLES","","","","768cfa32-003d-47bd-ab3a-3e27e4ec2699"
"EARTH SCIENCE","ATMOSPHERE","AEROSOLS","ORGANIC PARTICLES","","","","8929113a-ded5-4c39-b20f-7968ed114317"
"EARTH SCIENCE","ATMOSPHERE","AEROSOLS","PARTICULATE MATTER","","","","548a3f85-bf22-473b-b641-45c32d9c6a0c"
"EARTH SCIENCE","ATMOSPHERE","AEROSOLS","SULFATE PARTICLES","","","","ca71b02b-4446-414c-8697-0950d7382cc4"
RDA: DTR Fields
http://typeregistry.org
Name: Aerosol
Description: a branch of atmospheric earth science
Applicable Standard or recommendation: GCMD keyword & UUID
Provenance: who created the record
Expected use: aerosol datatypes
Representations and Semantics:
Expression: [Format, Character Set, Encoding, Measurement Unit]
Value: particles
Properties: Earth Science>Atmosphere
Experimental/Relationship
Questions:
Should this be built hierarchically – such that a record for “earth science” – then “atmosphere” – then “aerosol” should be
created?
How will we create definitions for the GCMD terms?
CF Definition for AQComCat:
Angstrom
angstrom_exponent_of_ambient_aerosol_in_air
alias: aerosol_angstrom_exponent
The "Angstrom exponent" appears in the formula relating aerosol optical thickness to the wavelength
of incident radiation: T(lambda) = T(lambda0) * [lambda/lambda0] ** (-1 * alpha) where alpha is
the Angstrom exponent, lambda is the wavelength of incident radiation, lambda0 is a reference
wavelength, T(lambda) and T(lambda0) are the values of aerosol optical thickness at
wavelengths lambda and lambda0, respectively. "Aerosol" means the system of suspended
liquid or solid particles in air (except cloud droplets) and their carrier gas, the air itself.
"Ambient_aerosol" means that the aerosol is measured or modelled at the ambient state of
pressure, temperature and relative humidity that exists in its immediate environment. "Ambient
aerosol particles" are aerosol particles that have taken up ambient water through hygroscopic
growth. The extent of hygroscopic growth depends on the relative humidity and the composition
of the particles. To specify the relative humidity and temperature at which the quantity described
by the standard name applies, provide scalar coordinate variables with standard names of
"relative_humidity" and "air_temperature".
RDA: DTR Fields
http://typeregistry.org
Name: Angstrom
Description: alias: aerosol_angstrom_exponent
The "Angstrom exponent" appears in the formula relating aerosol optical thickness to the wavelength of incident
radiation: T(lambda) = T(lambda0) * [lambda/lambda0] ** (-1 * alpha) where alpha is the Angstrom exponent,
lambda is the wavelength of incident radiation, lambda0 is a reference wavelength, T(lambda) and T(lambda0) are
the values of aerosol optical thickness at wavelengths lambda and lambda0, respectively. "Aerosol" means the
system of suspended liquid or solid particles in air (except cloud droplets) and their carrier gas, the air itself.
"Ambient_aerosol" means that the aerosol is measured or modelled at the ambient state of pressure, temperature
and relative humidity that exists in its immediate environment. "Ambient aerosol particles" are aerosol particles that
have taken up ambient water through hygroscopic growth. The extent of hygroscopic growth depends on the
relative humidity and the composition of the particles. To specify the relative humidity and temperature at which the
quantity described by the standard name applies, provide scalar coordinate variables with standard names of
"relative_humidity" and "air_temperature".
Applicable Standard or recommendation: CF Convention
Provenance: who created the record
Expected use:aerosol optical thickness
Representations and Semantics:
Expression: [Format, Character Set, Encoding, Measurement Unit]
Value: particles
Experimental/Relationship
CF & AQComCat
Aerosol Optical Depth = no term found
Aerosol Optical Thickness = no CF term found; but:
http://disc.sci.gsfc.nasa.gov/data-holdings/PIP/aerosol_optical_thickness_or_depth.shtml
Ozone = many CF terms found
Particulate matter = no term found
Download