1 CSIG 08 Cyberinfrastructure Summer Institute for Geoscientists August 11-15, 2008 San Diego 2 WELCOME ! 3 Acknowledgements CSIG’08 Speakers/Instructors • Margaret Smeekens • Prof. Randy Keller, University of Oklahoma • Dr. Don Middleton, NCAR • Dr. Peter Fox, NCAR • Dr. Deborah Kilb, SIO/UCSD • • • • • • Prof. Ann Gates, UT El Paso Dr. Sriram Krishnan, SDSC Dr. Kai Lin, SDSC Ashraf Memon, SDSC Dr. Ilya Zaslavsky, SDSC Tom Whitenack, SDSC • Stuart Weir, UNAVCO • Prof. Falko Kuester, UCSD • Dr. David Nadeau, SDSC/UCSD • John Moreland, SDSC/UCSD • • • • Dr. Phil Maechlin, SCEC/USC Prof. Ramon Arrowsmith, ASU Chris Crosby, SDSC Ilkay Altintas, SDSC 4 Acknowledgements GEON Team • • • • • • • Margaret Banton Sandeep Chandra Chris Crosby Kai Lin Ashraf Memon John Moreland David Nadeau • • • • • • • Viswanath Nandigam Choonhan Youn Randy Keller Brad Wallet Ramon Arrowsmith Charles Meertens Ann Gates 5 Acknowledgements • National Science Foundation • CSIG has been funded each year as a supplement to GEON, since 2004 (has already been funded for 2009) 6 Schedule • Monday – Introduction, Scientific Challenges, Examples of Cyberinfrastructure from other projects/sciences • Tuesday – Visualization Frameworks (and data issues) • Wednesday – Service-Oriented Architecture and Web Services • Thursday – GEON LiDAR Workflow (GLW) • Friday – Related CI Resources, Tools, and Technologies 7 LOGISTICS • Webcasting and video archives 8 INTRODUCTIONS 9 Enabling Discoveries in the Earth Sciences Through GEON Chaitan Baru SDSC/UCSD 10 A Vision for Geoinformatics From the NSF Workshop on Envisioning a National Geoinformatics System for the United States Denver, March 2007 “…a future in which someone can sit at a terminal and have easy access to vast stores of data of almost any kind, with the easy ability to visualize, analyze and model those data.” 11 An Example: The GEON Portal portal.geongrid.org • GEON Search • Workbench • Dynamic map services, map integration Geologic Map Integration +/- a few hundred million years 13 GEON Portal • Paleo database integration • LiDAR data access and data processing • SYNSEIS: Online access to computational modeling system • Gravity and Magnetic database for US 14 Other Data Portal Examples • EarthScope Data Portal • Hydrologic Information System (HIS) – Data Access System for Hydrology (DASH) 15 Integrated Cyberinfrastructure System Education and Training Discovery & Innovation Source: Dr. Deborah Crawford, Chair, NSF CI Working Committee Application Domains • Geosciences, Engineering, Environmental Sciences, Physics, Astronomy, Archaeology, Neurosciences, Biomedicine, … Development Tools & Libraries Domain-specific Cybertools (software) Shared Cybertools (software) Middleware Services Hardware Distributed Resources (computation, storage, communication, etc.) 16 Cyberinfrastructure • From NSF’s Cyberinfrastructure Vision for 21st Century Discovery, www.nsf.gov/od/oci/ci-v7.pdf, July 20, 2006 “The comprehensive infrastructure needed to capitalize on dramatic advances in information technology has been termed cyberinfrastructure. Cyberinfrastructure integrates hardware for computing, data and networks, digitally-enabled sensors, observatories and experimental facilities, and an interoperable suite of software and middleware services and tools. Investments in interdisciplinary teams and cyberinfrastructure professionals with expertise in algorithm development, system operations, and applications development are also essential to exploit the full power of cyberinfrastructure to create, disseminate, and preserve scientific data, information, and knowledge…” • pp40 of the report: “In 1999, the PITAC released the seminal report ITR-Investing in our Future, prompting new and complementary NSF investments in CI projects, such as the Grid Physics Network (GriPhyN) and international Virtual Data Grid Laboratory (iVDGL) and the Geosciences Network, known as GEON.” 17 Community Cyberinfrastructure Projects Friendly Work-Facilitating Portals Ocean Observing (ORION) Ecological Observatories (NEON) Hardware Earthquake Engineering (NEES) Middleware Services Geosciences (GEON) Development Tools & Libraries Biomedical Informatics (BIRN) Source: Prof. Mark Ellisman, UC San Diego High Enegy Physics (GriPhyN) Authentication - Authorization – Auditing - Resource Discovery - Workflows Visualization - Analysis Your Specific Tools & User Apps. Shared Tools ScienceDomains Distributed Computing, Instruments and Data Resources Portal-based Science Environments Support for resource sharing and collaborations 19 Virtual Organizations • Multiple participants • Distributed sites • Participants are from different “administrative domains” • Policies, rules, systems of the VO may be different than those of the participating organizations 20 GEON Background • Initiated in 2002 as a 5-year NSF ITR (IT Research) project • Collaboration among 12 PI institutions and number of other organizations • Distributed network of GEON “nodes” – Provides a standardized software platform – Provides a machine outside the local environment (for hosting data, software tools, and applications for remote access – Can be centrally administered • Funded now under the NSF Earth Sciences (EAR) Geoinformatics program 21 Geoinformatics From David Lambert, NSF EAR/GEO Presentation at GEON Annual Meeting, 2005 22 Other Geoinformatics Efforts • OneGeology.org – International initiative of geological surveys to create dynamic geological map data available via the web. • USGS initiative – Presentation by Dr. Linda Gundersen, at Geoinformatics 2007, San Diego. USGS Role in Geoinformatics Fundamental: develop, maintain, make accessible: Long-term national and regional geologic, hydrologic, biologic, and geographic databases Earth and planetary imagery Open-source models of the complex natural systems and human interaction with that system Physical collections of earth materials, biologic materials, reference standards, geophysical recordings, paper records. National geologic, biologic, hydrologic, and geographic monitoring systems Standards of practice for the geologic, hydrologic, biologic, and geographic sciences Source: Presentation by Dr. Linda Gundersen, USGS, at Geoinformatics 2007, San Diego, CA. USGS Role in Geoinfomatics All activities: Data creation, modeling, monitoring, collections, standards etc. Must be done in cooperation and collaboration with the public and governmental, academic, and private sector partners and stakeholders. A critical USGS role: facilitate bringing communities together! Source: Presentation by Dr. Linda Gundersen, USGS, at Geoinformatics 2007, San Diego, CA. Data Collections versus Communities of Practice Geoinformatics must evolve beyond the accumulation of data, models, and standards to become the framework for a community of practice in the natural sciences. Etienne Wegner and Jean Lave coined the term and developed the learning theory of communities of practice – that we learn not only as individuals but as communities. By engaging in communities of practice we increase our capacity and innovation as well as leverage our support for areas of interest. Source: Presentation by Dr. Linda Gundersen, USGS, at Geoinformatics 2007, San Diego, CA. Creativity, Learning, and Innovation A community of practice is not merely a community with a common interest. But are practitioners who share experiences and learn from each other. They develop a shared repertoire of resources: experiences, stories, tools, vocabularies, ways of addressing recurring problems. This takes time and sustained interaction. Standards of practice and reference materials will grow out of this. But the critical benefits include: creating and sustaining knowledge, leveraging of resources, and rapid learning and innovation. Source: Presentation by Dr. Linda Gundersen, USGS, at Geoinformatics 2007, San Diego, CA. 1000’s of National and Regional Databases The National Map – topographic, elevation, orthoimagery, transportation hydrography etc. Geospatial One Stop-portal MRDATA – Mineral Resources and Related Data The National Geologic Map Database stnadardized community collection of geologic mapping National Water Information System NWISWeb National Geochemical Survey Database (PLUTO, NURE) National Geophysical Database (aeromag, gravity, aerorad) Earthquake Catalogs North American Breeding Bird Survey National Vegetation/speciation maps National Oil and Gas Assessment Source:Inventory Presentation by Dr. Linda Gundersen, USGS, at Geoinformatics National Coal Quality 2007, San Diego, CA. 28 Geoscience Information Network 29 A Use Case for GEON • A user request of the form: “For a given region (i.e. lat/long extent, plus depth), return a 3D structural model with accompanying physical parameters of density, seismic velocities, geochemistry, and geologic ages, using a cell size of 10km” 30 Interoperability • Data Interoperability – Ability to discover, access, integrate data sets – “Third-party”, heterogeneous, remote data • Software Interoperability – “Software as a service” – Ability to discover services – Ability to link data with services, and “orchestrate” services 31 Data interoperability onion • System Interop Social Networks – Approaches: e.g., ODBC, JDBC, Java, Web services, … Semantics – Purview of: Computer Science • Syntactic Syntax Systems – Approaches: Schema standards – Purview of: Standards organizations, domain science repositories, data archives Social Networks Semantics Syntax Systems • Semantic – Approaches: Controlled vocabularies, thesaurii, domain ontologies – Purview of: Domain scientists • Social Networks – Approaches: recommendation systems – Purview of: social networking software (CS and domain science, data driven) 32 Software interoperability onion • System Interop – Approaches: e.g., REST, Web services • Syntactic – Approaches: e.g., SOAP, WSDL • Semantic Social Networks Semantics Syntax Systems – Approaches: Controlled vocabularies, thesaurii, domain ontologies – Purview of: Domain scientists • Social Networks – Approaches: recommendation systems – Purview of: social networking software • Service orchestration via worflow systems