1 CSIG 10 Survey of Emerging IT Trends and Technologies Chaitan Baru SDSC 2 Cyberinfrastructure • The “cyberinfrastructure” initiative is an attempt to provide explicit investments in IT for science & engineering research and education • From NSF’s Cyberinfrastructure Vision for 21st Century Discovery, www.nsf.gov/od/oci/ci-v7.pdf, July 20, 2006 – “The comprehensive infrastructure needed to capitalize on dramatic advances in information technology has been termed cyberinfrastructure.” – “…integrates hardware for computing, data and networks, digitally-enabled sensors, observatories and experimental facilities…: – “…an interoperable suite of software and middleware services and tools...” – Investments in interdisciplinary teams and cyberinfrastructure professionals with expertise in algorithm development, system operations, and applications development are also essential…” – “In 1999, the PITAC released the seminal report ITR-Investing in our Future, prompting new and complementary NSF investments in CI projects, such as the Grid Physics Network (GriPhyN) and international Virtual Data Grid Laboratory (iVDGL) and the Geosciences Network, known as GEON.” 3 Geoinformatics • A vision for Geoinformatics, from the NSF Workshop on Envisioning a National Geoinformatics System for the United States Denver, March 2007 – “…a future in which someone can sit at a terminal and have easy access to vast stores of data of almost any kind, with the easy ability to visualize, analyze and model those data.” 4 Geoinformatics From David Lambert, NSF EAR/GEO Presentation at GEON Annual Meeting, 2005 5 Geoinformatics Cyberinfrastructure for the Solid Earth Sciences: Objectives • Make data, tools, applications …and communities… easily accessible online • Provide an integration environment for 3D and 4D geoscience data integration Book to be published this year by Cambridge University Press. Co-editors: Randy Keller and Chaitan Baru 6 A Use Case for Geoinformatics • A user request of the form: “For a given region (i.e. lat/long extent, plus depth), return a 3D structural model with accompanying physical parameters of density, seismic velocities, geochemistry, and geologic ages, using a cell size of 10km” Portal-based Science Environments Support for resource sharing and collaborations EarthScope Data Portal - SDSC San Diego - IRIS Seattle - UNAVCO Boulder - ICDP Potsdam portal.earthscope.org 9 CUAHSI Hydrologic Information System, HIS (http://his.cuahsi.org) – Data Discovery, Data Access, Data Publication 10 GEON: Geosciences • • • * The * Network Funded by NSF IT Research program Multi-institution collaboration between IT and Earth Science researchers GEON Cyberinfrastructure provides: – – – – – – Authenticated access to data and Web services Registration of data sets, tools, and services with metadata Search for data, tools, and services, using ontologies Scientific workflow environment and access to HPC Data and map integration capability Scientific data visualization and GIS mapping network / grid concept has been evolving over past several years GEON: The Geosciences Network www.geongrid.org GEON is a coalition among IT and Earth Science researchers with the goal of developing advanced information technologies to enable new modes of geosciences research GEON is developing technologies for information integration and knowledge discovery Project participants: 14 PI institutions, and partners including, other projects, agencies, and industry GEON has deployed a Web services-based, distributed computing infrastructure, called the GEONgrid, across PI and partner sites GEONgrid provides access to data collections, tools, and applications that support geosciences research Project funding: $11.25M, 2002-2007 RESEARCH AND EDUCATION PRODUCTS AND RESULTS Technologies for Ontology-Based Data Registration, GIS Map Integration, Distributed Portals, and 4D Visualization Research on 3D Lithospheric structure Gravity Modeling Remote Sensing Data Integration Cyberinfrastructure Summer Institute for Geoscientists and graduate courses in Geoinformatics GEON Partners • 14 PI institutions • Over 20 other partners including, universities, industry, government agencies/labs PI Institutions • Arizona State University • Bryn Mawr College • Penn State University • Rice University • San Diego State University • San Diego Supercomputer Center/UCSD • University of Arizona • University of Idaho • University of Missouri, Columbia • University of Texas at El Paso • University of Utah • Virginia Tech • UNAVCO • Digital Library for Earth System Education (DLESE) Partners • Chronos • CUAHSI-HIS • ESRI • Calit2 • Georgia State University • Geological Survey of Canada • Georeference Online • HP • IBM • Lawrence Livermore Natl Laboratory • NASA Goddard, Earth System Division • SCEC • U.S. Geological Survey (USGS) • Purdue University Affiliated Projects • EarthScope, IRIS Key Informatics Areas • Portals – Authenticated, role-based access to cyber resources: data, tools, models, model outputs, collaboration spaces, … • Data Integration – Search, discovery and integration of data from heterogeneous information sources (“mediation” and “semantic integration”) • Use of workflow systems, and access to HPC – Ability to “program” at a higher level of abstraction – Sharing of models, along with “provenance” information – Gateways to HPC environments • Management of Geospatial Information – Using GIS capabilities, map services, geospatial data integration • Visualization of 3D, 4D geospatial data and information 14 GEON Portal portal.geongrid.org • Generic Capabilities: – Search – Workbench – Dynamic map services, map integration • Applications: – Paleo database integration – LiDAR data access and data processing – SYNSEIS: Online access to computational modeling system – Gravity and Magnetic database for US GEON and Related Portals Chesapeake Bay Environmental Observatory National Ecological Observatory Network Prototype CUAHSI Hydrologic Information System Tropical Ecology Assessment and Monitoring Network EarthScope Data Search and Integration GEON LiDAR Workflow (GLW) Portlet 18 GEON Project and Funding Structure GEON NSF EAR/IF Facility (GEO, OCI, CISE) • NSF ITR • OCI Software Development for Cyberinfrastructure (SDCI) OpenTopography OpenEarth Framework NSF Geoinformatics GEON Portal NSF CluE (GEO, CISE) CluE 19 Integrated Cyberinfrastructure System Education and Training Discovery & Innovation Source: Dr. Deborah Crawford, Chair, NSF CI Working Committee Application Domains • Geosciences, Engineering, Environmental Sciences, Physics, Astronomy, Archaeology, Neurosciences, Biomedicine, … Development Tools & Libraries Domain-specific Cybertools (software) Shared Cybertools (software) Middleware Services Hardware Distributed Resources (computation, storage, communication, etc.) 20 Community Cyberinfrastructure Projects Friendly Work-Facilitating Portals Ocean Observing (ORION) Ecological Observatories (NEON) Earthquake Engineering (NEES) Hardware Geosciences (GEON) Middleware Services Biomedical Informatics (BIRN) Development Tools & Libraries High Enegy Physics (GriPhyN) Authentication - Authorization – Auditing - Resource Discovery - Workflows Visualization - Analysis Your Specific Tools & User Apps. Shared Tools ScienceDomains Source: Prof. Mark Ellisman, UC San Diego Distributed Computing, Instruments and Data Resources 21 Services implied by the Geoinformatics use case “For a given region (i.e. lat/long extent, plus depth), return a 3D structural model with accompanying physical parameters of density, seismic velocities, geochemistry, and geologic ages, using a cell size of 10km” 22 Services implied by the use case 1. Search and discovery 2. Data access 3. Data integration, including transformations, model execution, and visualization Some scientific visualization 4. Result publication (and preservation— so that results can be searched and Digital libraries and discovered) All in aarchives distributed environment 23 Data “integration” • A priori integration – Consistent metadata and data standards and data “schema”/structure, and semantics are pre-defined across a set of data resources – User simply issues a query and receives a result versus • Ad hoc integration – Consistent standards for discovery and data access, but retrieved data are visualized in a common environment and user interactively integrates the data 24 Evolution of distributed environments • Mainframes – with distributed “synchronous” terminals • Networked minicomputers – with proprietary computer networking protocols • The Web – Engineering workstations with open communications protocols 25 Evolution of distributed environments • The Grid – Distributed computational and storage resources owned by organizations, orchestrated together to form “metacomputers” • The Cloud – On-demand computational and storage resources provided as a service over the Internet, with incremental cost models 26 Clients in a distributed environment • “Dumb” terminals – IBM 3270, vt100 • “Thick” clients – Workstations as clients in a client-server system • “Thin” clients – Original PC desktops • Thick clients – Modern PCs with powerful capabilities (64-bit, multicore, large memory) • Thin clients – Mobile devices 27 Distributed environments…contd. • Service-oriented architecture, SOA – A programming style for distributed computing – Services may be distributed in wide area (Internet scale) – or local area (within a datacenter) • Data inertia – Moving data to computation vs – Computation to data 28 Virtual Organizations (VOs) • A socio-technical concept • A distributed collection of entities and resources that come together to solve a specific problem – – – – Multiple participants Distributed sites Participants are from different “administrative domains” Policies, rules, systems of the VO may be different than those of the participating organizations • Requires agreement on basics standards and protocols to enable resource and data sharing 29 Other Geoinformatics Efforts • OneGeology.org – International initiative of geological surveys to create dynamic geological map data available via the web. • USGS initiative – Presentation by Dr. Linda Gundersen, at Geoinformatics 2007, San Diego. USGS: 1000’s of National and Regional Databases The National Map – topographic, elevation, orthoimagery, transportation hydrography etc. Geospatial One Stop-portal MRDATA – Mineral Resources and Related Data The National Geologic Map Database stnadardized community collection of geologic mapping National Water Information System NWISWeb National Geochemical Survey Database (PLUTO, NURE) National Geophysical Database (aeromag, gravity, aerorad) Earthquake Catalogs North American Breeding Bird Survey National Vegetation/speciation maps National Oil and Gas Assessment Source: Presentation by Dr. Linda Gundersen, USGS, at Geoinformatics National Coal Quality Inventory 2007, San Diego, CA. 31 USGIN: Geoscience Information Network