DATA SYSTEMS FOR SAMPLEBASED OBSERVATIONS Kerstin Lehnert 1 2 Data from Samples Distributed data acquisition Different labs/researchers analyze the same sample or subsamples of it. Distributed data publication Different data for the same sample are published in different papers. Distributed data archiving Data for the same sample are kept in different data systems. Integrated data access required to maximize utility. 3 Geochemical Data diverse hundreds of parameters thousands of materials vary with space and time over a range of more than ten orders of magnitude complex mostly sample-based with complex relations among samples & subsamples distributed data acquisition (one sample analyzed in different labs by different researchers at different times) Idiosyncratic data acquisition methods 4 Geoinformatics for Geochemistry DATABASES thematic geochemical databases (PetDB, SedDB, VentDB) DATA REPOSITORY Geochemical Resource Library REGISTRIES System for Earth Sample Registration SESAR IEDA Data Publication Agent of the STD-DOI system (DataCite®) GeoPass: single sign-on authentication system DATA ACCESS & ANALYSIS TOOLS GfG user interfaces EarthChem Data Engine (Portal) 5 GfG Architecture EarthChem Portal EarthChem XML DB Geochemical Resource Library Topical Data Collections External Databases Metadata catalog datasets (original data & derived products) GCDM DB GEOROC NAVDAT User Submission GfG Data Entry USGS 6 GeoChemical Data Model publication feature of interest data source sample analysis collection, geospatial material preparation, obs. point observed value method/DQ 7 Metadata Geospatial Geographical coordinates Geographical names Collection Sampling technique Field program Description & Age Classification Texture Alteration Age Data Quality Technique Instrument Laboratory Precision Reference material measurements Correction procedures 9 10 11 12 13 14 15 Standards for Data Access & Integration WMS, WFS For visualization tools OAI-PMH For joint data inventories EarthChemML For integration across geochemical data systems For interoperability with other systems 16 17 IEDA System-wide Inventory RSS feed Inventory Expedition Metadata Reference Metadata Dataset Metadata Geospatial Metadata DOI Registration MGDS EarthChem Chemical Data Cruise Info GRL Geochem DBs Object Registration Object Metadata SESAR EarthChem Portal GEORO C NAVDA T PetDB XML XML USGS Others XML XML XML EarthChem Data Engine Database EarthChem Data Engine Search & Visualization Partner databases encode their data & metadata in XML and send them to the EarthChem portal database in Kansas. Queries submitted at the EarthChem portal search the contents of the EarthChem Portal Database. 19 20 Access Levels EarthChemML EarthChem Repository: user submission need tools that are easy to use and support the data flow from lab to publication ideally, represent ‘pipelines’ for data capture early in the data acquisition process tools need to include data validation and DQC procedures offer citable data publication need data policies 23 IEDA data publication service 24 STD-DOIs The STD-DOI metadata are mainly Dublin Core elements, plus data specific elements. The metadata transmitted to the National Library via web service (HTTP/SOAP) and incorporated into the library catalogue. The metadata may contain references to other objects (DOI, IGSN, ...): Element <RelatedIdentifier> isCited, isParent, isChild, isDuplicate, … 25 STD-DOIs The element <relatedIdentifier> can be used to point to other electronic objects: Point to the literature where the data set is interpreted. Point to samples, from which the data were derived. Point to other datasets that belong to the same collection of datasets. These links can be used by machines (e.g. data portals) to make search suggestions and thus aid discovery of data, literature and samples, or other added value services. 26 STD-DOI System Architecture Data DOIs 28 Information Discovery Link to publication Citation of data IGSN points to sample The International GeoSample Number 30 Ambiguous Sample Naming Name D3-1 D3-1 D3-1 D3-1 3-1 Location Publication SEIR ANDERSON, 1980 North Fiji Basin EISSEN 1994 Shimada Smt GRAHAM 1988 Gorda Ridge CLAGUE 1984 Lamont Smts BATIZA 1982 Sample names are duplicated. Sample names are modified or changed. Cruise VM3301 (Vema) Starmer 1 (Nadir) S1-79 (Sea Sounder) KK2-83NP (Kana Keoki) Examples from the PetDB Database RISE III (New Horizon) Dredge sample 3, Amphitrite Cruise 1963/4 D3 Engel 1964 D-3 Scheidegger 1981, Schilling 1971 PD3 Tatsumoto 1965, 1966 PD-3 Hedge 1970, Muehlenbach 1972 PV D-3 Engel 1965 AMPH3D Pineau 1976 AMPH-D3 MacDougall 1986 AMPH D-3 Sun 1980, Schilling 1975 AMPH 3-PD-3 Hart 1971 S-10 Subbarao 1972 Provides & manages unique identifiers for samples IGSN - International Geo Sample Number Assigned upon registration of sample metadata Catalogs & archives sample metadata Access to sample metadata via web site & web services Long-term preservation of metadata Link to sample archives Facilitates links to data IGSN will be incorporated into persistent resolvable GUIDs International GeoSample Number A Global Unique Identifier for Earth Samples IGSN:SIO8JH3M4 Name space Strict syntax (9 digits, alphanumeric) First three characters are unique user code (registered with SESAR) Last 6 characters are random numbers + letters Allows 2,176,782,336 sample identifiers per registrant Does not replace personal or institutional names. Applied to samples & sub-samples system tracks relations www.geosamples.org 33 Parent Child Parent Core Section 1 IGSN:XXX0065B3 Core IGSN:XXX000120 Core Section 2 IGSN:XXX07ST4K Core Section 3 IGSN:XXX9K23G6 Child Parent Sample 1 Child Fossil separate Microprobe mount Sample 2 Sample 1 Rock powder IGSN:ABC0L653X Sample 2 Mineral conc. IGSN:ABC078HGB Sample 3 Leachate IGSN:ABC0L53NW Sample 1 Sample 2 Sample 3 IGSN:XYZ0G693M Geoinformatics for Geochemistry IGSN:ABC0L98SW Sample Types “Sampling events” such as holes, cores, dredges, stratigraphic sections “Individual samples”: specimens rocks, minerals, fossils, fluid samples, precipitates, synthetic material, etc. “Sub-samples” of any of above: processed samples such as mineral or fossil separates, leachates, thin sections, etc. Sample Registration Spreadsheet forms for batch loading SESAR Web Site Interoperability (web services) Implementation Challenges Diversity of users Large sampling campaigns (IODP, ICDP, ECS) Repositories Data systems Individual investigators Diversity of sample types Integration into existing policies, procedures, data systems International scope Connectivity in the field 37 Solutions Schema improvements Web-service based registration from client data systems Distributed system of registration nodes (Trusted Agents) Handle service for IGSNs (persistent, resolvable) http://dx.doi.org/18.2539/IGSN.SIO001234 Tools to facilitate registration iSESAR (registration via iPhone) eCollections (personal sample management) webCollections (hosting services for repositories) IGSN International Consortium 38