Community Earth Science Informatics Initiatives & Their Impacts Lee Allison, Arizona Geological Survey Association of American State Geologists 200 million+ websites – if you don’t have a website, you don’t exist. Prediction In 5-10 years, if your data are not online in an integrated, interoperable network, you won’t exist. 1000’s of National and Regional Databases topographic, orthoimagery, hydrography mineral resources water geochemistry geophysics (aeromag, gravity, aerorad) earthquake catalogs biological surveys vegetation/speciation maps Conclusions: Growing Consensus for an NGS Goals – interoperable, distributed, Web-service based, synoptic 4-D system Challenges • Technical – adapting-adopting existing capabilities • Cultural –organizational – controls, recognition How do we get there? • • • • • Agreement on standards, protocols, architecture Geological Surveys as data archives, providers Parallel community efforts are linking Implementation is underway Sustainability is an issue Current electronic delivery The Goal Most of the technology exists Challenges are cultural and organizational With apologies to JRR Tolkien One system to rule them all, one system to find them, one system to bring them all, and in the darkness bind them. How do we get there? NSF to the Solid Earth Sciences: how do you build a sustainable community system? - 2-year community engagement process underway Earth science cyberinfrastructure Early paradigm: Central databases for each topic Distributed Web-based Interoperable Goal is making data interoperable Ian Jackson, BGS interoperability "The capability to communicate, execute programs, or transfer data among various functional units in a manner that requires the user to have little or no knowledge of the unique characteristics of those units." ISO/IEC 2382-01 (SC36 Secretariat, 2003) Example: the electrical utility Simple interface– put plug in wall, get electricity Afghanistan 220 V 50 Hz Andorra 230 V 50 Hz Anguilla 110 V 60 Hz Antigua 230 V* 60 Hz Cayman Islands 120 V 60 Hz Cyprus 240 V 50 Hz Czech Republic 230 V 50 Hz …… Complexity Other complex things National Geoinformatics System “Killer applications” User cases & best practices in meeting stakeholder needs Data discovery, catalogs, inventories, metadata profiles, metadata aggregation service(s) – 4D search engines, Informatics specifications, data model, interoperability, & standards Web portal & Registry development and implementation Accessing & licensing protocols, recognition & credit Community of practice Communication, dissemination, & awareness Ontologies, vocabularies Access to high-resolution spatial geological & applied datasets “Big Iron” – high performance computing Digitization of legacy data Liaison and integration with related groups & initiatives Sustainability Computer printer services Old days Each application has driver for each printer HP Driver1 CalcompDriver1 Word Processor HP printer Brothers Driver1 Calcomp plotter HP Driver2 Now Brothers printer Spreadsheet Word Processor CalcompDriver2 Printer driver Brothers Driver2 Printing service, uses Metafile= interchange format Metafile interpreter Metafile interpreter Laserwriter Large format inkjet Printer Metafile •Advantages Spreadsheet interpreter driver •one driver (wrapper) per application wrapper service wrapper •Application need know nothing about printer—separation of concerns Film writer GSC GSC schema NGMDB USGS BGS GA USGS schema BGS schema GA schema wrapper wrapper Interoperability via web service Web Services wrapper Client wrapper wrapper Communication between service providers and clients takes using XMLmarkup mark up.language means Useplace of standard schema mapping only needs to be done once Wrapper implements interface to service — formulate requests, interpret results Participants implement one interface for each service Applications focus on application logic, not data access. Mark-up language “wrapper” translates your data Cocoon Ottawa, Canada Mapserver Arizona GeoServer Keyworth, UK Cocoon Virginia, USA Cocoon Uppalla, Sweden Ionic Orleans, France Tsukuba, Japan GeoServer Canberra GeoServer Melbourne, Australia GeoSciML developers Using a web service – step 1 GeoSciML Web Services: Request Web service request – step 2 GeoSciML Web Services: Request Web service response – part 1 GeoSciML Web Services: Response Web service response - part 2 GeoSciML Web Services: Response ORGANIZATION: Unique missions of geological surveys - collect, archive, disseminate data Geoscience Information Network (GIN) Distributed Web-based Interoperable 2,000 – 3,000 databases 1000’s of collections 80,000+ geologic maps We agree on a data network that: •is distributed (vs centralized) •is interoperable •uses open source standards and common protocols (OGC, GeoSciML) •respects and acknowledges data ownership •fosters communities of practice to grow •facilitates development of new web services and clients System overview GIN Geologic map service scenario Catalog: NGMDB? OneGeology? NDC? GEON? NGDS? Registration Survey map servers OGC CSW OGC WMS ArcMap ArcGIS National Geologic & Geophysical Data Preservation Program -$1M per year -National inventory -Metadata catalogue -National Digital Catalogue Data discovery - 79,000+ maps, images, data, and products from 350+ publishers Lexicon of Geologic Names of the United States Defining GIN collections of service definitions, interchange formats, and vocabularies independent of hardware, operating system, or lowerlevel network protocols new technology will only require implementation of network elements in a new environment architecture allows for the use of multiple conventions for different user groups Service definitions Interchange format standards Discovery tools GIN Community engagement Vocabularies WWW http – hypertext transfer protocol (& ftp, etc) GIN html – hypertext mark-up language url – universal resource locator browser – built by others Open source standards – Open Geospatial Consortium data interchange tool – GeoSciML distributed data catalogues (National Geologic Map DB; National Data Catalogue, etc) Web services & applications – built by others Challenges to building community Who sets the standards? Who controls the system? Who makes the decisions? The network is voluntary, not imposed from above We won’t take your data away – they stay with you Your participation is voluntary Keep your formats, system, servers Will 3,000 interoperable data bases become an 800-lb gorilla? GIN is partnering with the global Earth science community AASG & USGS National Geoinformatics System OneGeology-Europe – 21 nations Marine Metadata Interoperability Initiative US DOE National Geothermal Data System (NGDS) US DOE Geothermal Technologies Program Energy Industry Metadata Standards Working Group - Energistics PARTNERS & COLLABORATORS: MS SciScope – geospatial data discovery Welcome to SciScope! SciScope is a tool by Microsoft Research to help geoscientists discover data from numerous data repositories with ease through a single, intuitive interface. Users can display multiple map layers related to the scope of their study and interact with geographical features on the map including dams, rivers, water bodies, geology, aquifer systems, ecological regions and river basins. GIN DEMO PROGRAM NSF INTEROP GIN 3 year development of standards, services Demos in ~6 SGSs; ~$80K subcontracts “Circuit Riders” Part trainer, part management consultant, part computer expert Write GeoSciML “wrappers” Guide server configurations Training, short courses $80K for demos across AASG ADOPTION & DEPLOYMENT US Dept. of Energy (May, 2009) • National Geothermal Data System (NGDS) • GIN architecture, standards • $5M, 5 years • Adopted by US Geothermal Technologies Program National Geothermal Data System Distributed data sources NGDS Legacy data repository Desktop applications (GeoSciNet) Ontologies, vocabularies Discovery, access, exchange (GIN) Portals (GeoSciNet, SciScope) National Geothermal Data System Data discovery, access, exchange: GIN Distributed content: geothermal community Legacy data repository: NGDS Desktop applications (economic modeling tool, etc): GeoSciNet Portals: GeoSciNet, SciScope NATIONAL DEPLOYMENT US DOE “Geothermal Data Development, Collection, and Maintenance” $20M, 1-5 awards AASG proposal submitted 106 nations 29 countries and European organizations are committed to create a geological map at 1:1.000.000 scale, integrated with metadata initially available in the following languages: English, French, Italian, Spanish, Swedish, Czech and Norwegian. Network sustainability tipping point at which users and providers will see the network as critical to their basic functions populating and using the network becomes a necessary cost of doing business how do we maintain network functions? How do we get there? NSF to the Solid Earth Sciences: how do you build a community system? - 2-year community engagement process underway Geological Surveys as drivers? - USGS, 51 state surveys, 21 European surveys, 106+ nations Linkage with other communities and natural science domains - MMI, OOS, CUAHSI-HIS, Geoscience Australia, iPlant, GBIF, ESIP, Energistics,….. ‘TIPPING POINT’ Energy Industry Metadata Standards Working Group • End-to-end discovery, access, and exchange of upstream petroleum data 97 members • • • • • • • • • • • • • • • • • • • • • • • • American Geological Institute (AGI) Baker Hughes BP British Geological Survey (BGS) Chevron ConocoPhillips Department of Interior (U.S. DOI-BLM-MMS) Directorate General of Hydrocarbons (India) (DGH) ExxonMobil Ground Water Protection Council Halliburton IBM Corporation IFP - Institut Francais du Petrole Norwegian Petroleum Directorate (NPD) Open Geospatial Consortium (OGC) Pioneer Natural Resources SAIC-Science Applications Intl. Corp. Saudi Aramco Schlumberger Shell Smith International StatoilHydro ASA TOTAL Woodside Energy Inc. Geoscience Information Network http://usgin.org