Community earth science informatics initiatives & their impact

advertisement
Community Earth Science
Informatics Initiatives
& Their Impacts
Lee Allison, Arizona Geological Survey
Association of American State Geologists
200 million+ websites –
if you don’t have a website, you don’t exist.
Prediction
In 5-10 years, if your data
are not online in an
integrated, interoperable
network, you won’t exist.
1000’s of National and Regional Databases








topographic, orthoimagery,
hydrography
mineral resources
water
geochemistry
geophysics (aeromag,
gravity, aerorad)
earthquake catalogs
biological surveys
vegetation/speciation maps
Conclusions: Growing Consensus
for an NGS


Goals – interoperable, distributed, Web-service
based, synoptic 4-D system
Challenges
• Technical – adapting-adopting existing capabilities
• Cultural –organizational – controls, recognition

How do we get there?
•
•
•
•
•
Agreement on standards, protocols, architecture
Geological Surveys as data archives, providers
Parallel community efforts are linking
Implementation is underway
Sustainability is an issue
Current electronic delivery
The Goal
Most of the
technology
exists
Challenges are
cultural and
organizational
With apologies to JRR Tolkien
One system to rule them all,
one system to find them,
one system to bring them all,
and in the darkness bind them.
How do we get there?
NSF to the Solid Earth Sciences: how do you build a sustainable
community system?
- 2-year community engagement process underway
Earth science cyberinfrastructure
Early paradigm:
Central databases for each topic
Distributed
Web-based
Interoperable
Goal is making data interoperable
Ian Jackson, BGS
interoperability
"The capability to communicate, execute
programs, or transfer data among various
functional units in a manner that requires
the user to have little or no
knowledge of the unique
characteristics of those units."
ISO/IEC 2382-01 (SC36 Secretariat, 2003)
Example: the electrical utility

Simple interface– put plug in wall,
get electricity
Afghanistan 220 V 50 Hz
Andorra 230 V 50 Hz
Anguilla 110 V 60 Hz
Antigua 230 V* 60 Hz
Cayman Islands 120 V 60 Hz
Cyprus 240 V 50 Hz
Czech Republic 230 V 50 Hz
……
Complexity
Other complex things
National Geoinformatics System













“Killer applications”
User cases & best practices in meeting stakeholder needs
Data discovery, catalogs, inventories, metadata profiles,
metadata aggregation service(s) – 4D search engines,
Informatics specifications, data model, interoperability, &
standards
Web portal & Registry development and implementation
Accessing & licensing protocols, recognition & credit
Community of practice
Communication, dissemination, & awareness
Ontologies, vocabularies
Access to high-resolution spatial geological & applied datasets
“Big Iron” – high performance computing
Digitization of legacy data
Liaison and integration with related groups & initiatives
Sustainability
Computer printer services

Old
days
Each application has driver for each printer
HP Driver1
CalcompDriver1
Word Processor

HP printer
Brothers Driver1
Calcomp plotter
HP Driver2
Now
Brothers printer
Spreadsheet
Word Processor
CalcompDriver2
Printer
driver
Brothers Driver2
Printing
service, uses
Metafile=
interchange
format
Metafile
interpreter
Metafile
interpreter
Laserwriter
Large format inkjet
Printer
Metafile
•Advantages
Spreadsheet
interpreter
driver
•one driver (wrapper) per application
wrapper
service
wrapper
•Application need know nothing about
printer—separation of concerns
Film writer
GSC
GSC
schema
NGMDB
USGS
BGS
GA
USGS
schema
BGS
schema
GA
schema
wrapper
wrapper
Interoperability via
web service
Web
Services
wrapper
Client
wrapper
wrapper
Communication between service providers and clients
takes
using XMLmarkup
mark up.language means
Useplace
of standard
schema mapping only needs to be done once
Wrapper implements interface to service — formulate requests, interpret results
Participants implement one interface for each service
Applications focus on application logic, not data access.
Mark-up language “wrapper” translates your data
Cocoon
Ottawa, Canada
Mapserver
Arizona
GeoServer
Keyworth, UK
Cocoon
Virginia, USA
Cocoon
Uppalla, Sweden
Ionic
Orleans, France
Tsukuba, Japan
GeoServer
Canberra
GeoServer
Melbourne, Australia
GeoSciML developers
Using a web service – step 1
GeoSciML Web Services: Request
Web service request – step 2
GeoSciML Web Services: Request
Web service response – part 1
GeoSciML Web Services: Response
Web service response - part 2
GeoSciML Web Services: Response
ORGANIZATION: Unique missions of geological
surveys - collect, archive, disseminate data
Geoscience Information Network (GIN)
Distributed
Web-based
Interoperable
2,000 – 3,000 databases
1000’s of collections
80,000+ geologic maps
We agree on a data network that:
•is distributed (vs centralized)
•is interoperable
•uses open source standards and
common protocols (OGC, GeoSciML)
•respects and acknowledges data
ownership
•fosters communities of practice to
grow
•facilitates development of new web
services and clients
System overview
GIN
Geologic map service scenario
Catalog:
NGMDB?
OneGeology?
NDC?
GEON?
NGDS?
Registration
Survey map
servers
OGC CSW
OGC WMS
ArcMap
ArcGIS
National Geologic &
Geophysical Data
Preservation Program
-$1M per year
-National inventory
-Metadata catalogue
-National Digital Catalogue
Data discovery -


79,000+ maps,
images, data, and
products from 350+
publishers
Lexicon of Geologic
Names of the United
States
Defining GIN




collections of service
definitions, interchange
formats, and vocabularies
independent of hardware,
operating system, or lowerlevel network protocols
new technology will only
require implementation of
network elements in a new
environment
architecture allows for the
use of multiple conventions
for different user groups
Service
definitions
Interchange
format
standards
Discovery
tools
GIN
Community
engagement
Vocabularies
WWW

http – hypertext
transfer protocol (&
ftp, etc)
GIN


html – hypertext
mark-up language


url – universal
resource locator


browser – built by
others

Open source
standards – Open
Geospatial Consortium
data interchange tool
– GeoSciML
distributed data
catalogues (National
Geologic Map DB;
National Data
Catalogue, etc)
Web services &
applications – built by
others
Challenges to building community
Who sets the standards?
Who controls the system?
Who makes the decisions?
The network is voluntary, not imposed from above



We won’t take your data away –
they stay with you
Your participation is voluntary
Keep your formats, system,
servers
Will 3,000 interoperable data bases
become an 800-lb gorilla?
GIN is partnering with the global Earth
science community
AASG & USGS
National Geoinformatics System
OneGeology-Europe – 21 nations
Marine Metadata Interoperability Initiative
US DOE National Geothermal Data System (NGDS)
US DOE Geothermal Technologies Program
Energy Industry Metadata Standards Working Group - Energistics
PARTNERS & COLLABORATORS:
MS SciScope – geospatial data discovery
Welcome to SciScope!
SciScope is a tool by Microsoft
Research to help geoscientists
discover data from numerous data
repositories with ease through a single,
intuitive interface.
Users can display multiple map layers
related to the scope of their study and
interact with geographical features on
the map including dams, rivers, water
bodies, geology, aquifer systems,
ecological regions and river basins.
GIN DEMO PROGRAM
NSF INTEROP GIN

3 year development of standards,
services

Demos in ~6 SGSs; ~$80K
subcontracts
“Circuit Riders”





Part trainer, part management
consultant, part computer expert
Write GeoSciML “wrappers”
Guide server configurations
Training, short courses
$80K for demos across AASG
ADOPTION & DEPLOYMENT

US Dept. of Energy
(May, 2009)
• National Geothermal Data System
(NGDS)
• GIN architecture, standards
• $5M, 5 years
• Adopted by US Geothermal Technologies
Program
National Geothermal Data System
Distributed data sources
NGDS
Legacy data
repository
Desktop applications (GeoSciNet)
Ontologies, vocabularies
Discovery, access,
exchange (GIN)
Portals
(GeoSciNet, SciScope)
National Geothermal Data System





Data discovery, access, exchange:
GIN
Distributed content: geothermal
community
Legacy data repository: NGDS
Desktop applications (economic
modeling tool, etc): GeoSciNet
Portals: GeoSciNet, SciScope
NATIONAL DEPLOYMENT



US DOE “Geothermal Data Development,
Collection, and Maintenance”
$20M, 1-5 awards
AASG proposal submitted
106 nations
29 countries and
European organizations
are committed to create a
geological map at
1:1.000.000 scale,
integrated with metadata
initially available in the
following languages:
English, French, Italian,
Spanish, Swedish, Czech
and Norwegian.
Network sustainability



tipping point at which users and
providers will see the network as
critical to their basic functions
populating and using the network
becomes a necessary cost of doing
business
how do we maintain network
functions?
How do we get there?
NSF to the Solid Earth Sciences:
how do you build a community system?
- 2-year community engagement
process underway
Geological Surveys as drivers?
- USGS, 51 state surveys, 21 European surveys, 106+ nations
Linkage with other communities and natural science domains
- MMI, OOS, CUAHSI-HIS, Geoscience Australia, iPlant, GBIF, ESIP, Energistics,…..
‘TIPPING POINT’

Energy Industry Metadata Standards
Working Group
• End-to-end discovery, access, and
exchange of upstream petroleum data

97 members
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
American Geological Institute (AGI)
Baker Hughes
BP
British Geological Survey (BGS)
Chevron
ConocoPhillips
Department of Interior (U.S. DOI-BLM-MMS)
Directorate General of Hydrocarbons (India) (DGH)
ExxonMobil
Ground Water Protection Council
Halliburton
IBM Corporation
IFP - Institut Francais du Petrole
Norwegian Petroleum Directorate (NPD)
Open Geospatial Consortium (OGC)
Pioneer Natural Resources
SAIC-Science Applications Intl. Corp.
Saudi Aramco
Schlumberger
Shell
Smith International
StatoilHydro ASA
TOTAL
Woodside Energy Inc.
Geoscience Information Network
http://usgin.org
Download