EarthChem - Data Systems for Sample

advertisement
DATA SYSTEMS FOR SAMPLEBASED OBSERVATIONS
Kerstin Lehnert
1
2
Data from Samples
 Distributed data acquisition
 Different labs/researchers analyze the same
sample or subsamples of it.
 Distributed data publication
 Different data for the same sample are published
in different papers.
 Distributed data archiving
 Data for the same sample are kept in different
data systems.
 Integrated data access required to maximize
utility.
3
Geochemical Data
 diverse
 hundreds of parameters
 thousands of materials
 vary with space and time over a range of more than
ten orders of magnitude
 complex
 mostly sample-based with complex relations among
samples & subsamples
 distributed data acquisition (one sample analyzed in different
labs by different researchers at different times)
 Idiosyncratic data acquisition methods
4
Geoinformatics for Geochemistry
 DATABASES
 thematic geochemical databases (PetDB, SedDB, VentDB)
 DATA REPOSITORY
 Geochemical Resource Library
 REGISTRIES
 System for Earth Sample Registration SESAR
 IEDA Data Publication Agent of the STD-DOI system (DataCite®)
 GeoPass: single sign-on authentication system
 DATA ACCESS & ANALYSIS TOOLS
 GfG user interfaces
 EarthChem Data Engine (Portal)
5
GfG Architecture
EarthChem Portal
EarthChem XML DB
Geochemical
Resource Library
Topical Data
Collections
External Databases
Metadata
catalog
datasets
(original data
& derived
products)
GCDM DB
GEOROC
NAVDAT
User Submission
GfG Data Entry
USGS
6
GeoChemical Data Model
publication
feature of
interest
data source
sample
analysis
collection,
geospatial
material
preparation,
obs. point
observed
value
method/DQ
7
Metadata
 Geospatial
 Geographical coordinates
 Geographical names
 Collection
 Sampling technique
 Field program
 Description & Age




Classification
Texture
Alteration
Age
 Data Quality
 Technique
 Instrument
 Laboratory
 Precision
 Reference material
measurements
 Correction procedures
9
10
11
12
13
14
15
Standards for Data Access &
Integration
 WMS, WFS
 For visualization tools
 OAI-PMH
 For joint data inventories
 EarthChemML
 For integration across geochemical data systems
 For interoperability with other systems
16
17
IEDA System-wide Inventory
RSS feed
Inventory
Expedition Metadata
Reference Metadata
Dataset Metadata
Geospatial Metadata
DOI Registration
MGDS
EarthChem
 Chemical Data
Cruise Info 
GRL
Geochem
DBs
Object Registration 
 Object Metadata
SESAR
EarthChem Portal
GEORO
C
NAVDA
T
PetDB
XML
XML
USGS
Others
XML
XML
XML
EarthChem
Data Engine
Database
EarthChem Data Engine
Search & Visualization
Partner databases encode their
data & metadata in XML and
send them to the EarthChem
portal database in Kansas.
Queries submitted at the
EarthChem portal search the
contents of the EarthChem
Portal Database.
19
20
Access Levels
EarthChemML
EarthChem Repository:
user submission
 need tools that are easy to use and support
the data flow from lab to publication
 ideally, represent ‘pipelines’ for data capture early
in the data acquisition process
 tools need to include data validation and
DQC procedures
 offer citable data publication
 need data policies
23
IEDA data publication service
24
STD-DOIs
 The STD-DOI metadata are mainly Dublin
Core elements, plus data specific elements.
 The metadata transmitted to the National
Library via web service (HTTP/SOAP) and
incorporated into the library catalogue.
 The metadata may contain references to
other objects (DOI, IGSN, ...):
 Element <RelatedIdentifier>
 isCited, isParent, isChild, isDuplicate, …
25
STD-DOIs
 The element <relatedIdentifier> can be used to
point to other electronic objects:
 Point to the literature where the data set is
interpreted.
 Point to samples, from which the data were derived.
 Point to other datasets that belong to the same
collection of datasets.
 These links can be used by machines (e.g. data
portals) to make search suggestions and thus aid
discovery of data, literature and samples, or
other added value services.
26
STD-DOI System Architecture
Data DOIs
28
Information Discovery
Link to
publication
Citation
of data
IGSN
points to
sample
The International GeoSample Number
30
Ambiguous Sample Naming
Name
D3-1
D3-1
D3-1
D3-1
3-1
Location
Publication
SEIR
ANDERSON, 1980
North Fiji Basin EISSEN 1994
Shimada Smt GRAHAM 1988
Gorda Ridge CLAGUE 1984
Lamont Smts BATIZA 1982
Sample names are duplicated.
Sample names are modified or
changed.
Cruise
VM3301 (Vema)
Starmer 1 (Nadir)
S1-79 (Sea Sounder)
KK2-83NP (Kana Keoki) Examples from the PetDB
Database
RISE III (New Horizon)
Dredge sample 3, Amphitrite Cruise 1963/4
D3
Engel 1964
D-3
Scheidegger 1981, Schilling 1971
PD3
Tatsumoto 1965, 1966
PD-3
Hedge 1970, Muehlenbach 1972
PV D-3
Engel 1965
AMPH3D
Pineau 1976
AMPH-D3
MacDougall 1986
AMPH D-3
Sun 1980, Schilling 1975
AMPH 3-PD-3 Hart 1971
S-10
Subbarao 1972
 Provides & manages unique identifiers for samples
 IGSN - International Geo Sample Number
 Assigned upon registration of sample metadata
 Catalogs & archives sample metadata
 Access to sample metadata via web site & web services
 Long-term preservation of metadata
 Link to sample archives
 Facilitates links to data
 IGSN will be incorporated into persistent resolvable GUIDs
International GeoSample Number
A Global Unique Identifier for Earth Samples
IGSN:SIO8JH3M4

Name space
 Strict syntax (9 digits, alphanumeric)



First three characters are unique user code (registered with SESAR)
Last 6 characters are random numbers + letters
Allows 2,176,782,336 sample identifiers per registrant
 Does not replace personal or institutional names.
 Applied to samples & sub-samples
 system tracks relations
www.geosamples.org
33
Parent
Child
Parent
Core
Section 1
IGSN:XXX0065B3
Core
IGSN:XXX000120
Core
Section 2
IGSN:XXX07ST4K
Core
Section 3
IGSN:XXX9K23G6
Child
Parent
Sample 1
Child
Fossil separate
Microprobe mount
Sample 2
Sample 1
Rock powder
IGSN:ABC0L653X
Sample 2
Mineral conc.
IGSN:ABC078HGB
Sample 3
Leachate
IGSN:ABC0L53NW
Sample 1
Sample 2
Sample 3
IGSN:XYZ0G693M
Geoinformatics for Geochemistry
IGSN:ABC0L98SW
Sample Types
 “Sampling events” such as holes, cores,
dredges, stratigraphic sections
 “Individual samples”: specimens rocks,
minerals, fossils, fluid samples, precipitates,
synthetic material, etc.
 “Sub-samples” of any of above: processed
samples such as mineral or fossil separates,
leachates, thin sections, etc.
Sample Registration
Spreadsheet
forms for batch
loading
SESAR Web Site
Interoperability
(web services)
Implementation Challenges
 Diversity of users




Large sampling campaigns (IODP, ICDP, ECS)
Repositories
Data systems
Individual investigators
 Diversity of sample types
 Integration into existing policies, procedures,
data systems
 International scope
 Connectivity in the field
37
Solutions
 Schema improvements
 Web-service based registration from client data
systems
 Distributed system of registration nodes (Trusted
Agents)
 Handle service for IGSNs (persistent, resolvable)
 http://dx.doi.org/18.2539/IGSN.SIO001234
 Tools to facilitate registration
 iSESAR (registration via iPhone)
 eCollections (personal sample management)
 webCollections (hosting services for repositories)
 IGSN International Consortium
38
Download