GEON IT Update PI Meeting, Blacksburg, VA March 21-23, 2004 GEON PI Meeting, March 21-23h, 2004, Blacksburg, VA CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Outline • Update on state of IT activities • GEON Software Architecture and project high-level goals • Update on activities at SDSC (since last meeting) • The GEON Portal • Knowledge representation • Development of knowledge structures • Schemas (metadata is implicit in this), Controlled vocabularies, Ontology structures… GEON PI Meeting, March 21-23h, 2004, Blacksburg, VA CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Components of the GEONgrid Architecture • GEONgrid Physical Implementation • Core Grid Services • Registry, authentication, access control, monitoring, replication, distributed filesystem, collection management (SRB), job submission, e.g. launch job to TeraGrid • “Higher-Order” Services • Registration: data and metadata, schema, ontology, services • Data Integration: spatial data integration, data systems integration, schema integration • 2D Visualization, including GIS • Workflow • 3D Viz, Augmented Reality • Portal • Portlet-based design. User space, GeonSearch, GeoWorkbench. GEON PI Meeting, March 21-23h, 2004, Blacksburg, VA CYBERINFRASTRUCTURE FOR THE GEOSCIENCES GEON Software Architecture activities • Architecture “Retreat” at SDSC, Jan 27-28 • Architecture document in preparation • Established GEON software development areas with Coordinator and Chief Programmers for each • Each group meets once a week. • Chief programmers meet once a week, on Monday • Would like to develop a schedule of visits of GEON PI’s to SDSC • To attend Monday meetings • GEONgrid software • Plans for 6 month, 1 year, 2 years • Release 1 by Dec. 2004 GEON PI Meeting, March 21-23h, 2004, Blacksburg, VA CYBERINFRASTRUCTURE FOR THE GEOSCIENCES GEON Software Development Areas Development Area Coordinator Chief Programmer 1. Core Grid Services Karan Bhatia Sandeep Chandra 2. Portal Dogan Seber Choonhan Youn 3. Data Registration Bertram Ludaescher Kai Lin 4. Mapping Ilya Zaslavsky Ashraf Memon 5. Mediation Pavel Velikhov Pavel Velikhov 6. Workflow Bertram Ludaescher Efrat Jaeger Also, Jane Park, Doug Greer, LJ Ding, and others … GEON PI Meeting, March 21-23h, 2004, Blacksburg, VA CYBERINFRASTRUCTURE FOR THE GEOSCIENCES GEONgrid Physical Implementation • PoP Nodes only • VaTech, Bryn Mawr, Penn State, Rice, Utah EGI, Utah, DLESE, UNAVCO • PoP nodes + Data Nodes • Idaho, Arizona State, SDSC • PoP nodes + Compute Nodes • Missouri, UTEP, SDSC GEON PI Meeting, March 21-23h, 2004, Blacksburg, VA CYBERINFRASTRUCTURE FOR THE GEOSCIENCES GEON Nodes • Compute nodes • Want to create at least a few nodes as a TeraGrid “sandbox” • GEONgrid is currently based on Redhat Linux, OGSI and Globus Toolkit Version 3 (GT3) • TeraGrid is currently based on SuSE Linux, GT2.4 • Sandbox allows GEON PI’s to develop debug software in GEONgrid prior to sending jobs to TeraGrid • GEON has a TeraGrid allocation (30,000hours) • Need to keep in mind GEONgrid heterogeneity • Windows and other platforms GEON PI Meeting, March 21-23h, 2004, Blacksburg, VA CYBERINFRASTRUCTURE FOR THE GEOSCIENCES GEON Services • “Hosted” vs “non-hosted” services • Hosted: service is implemented within the physical GEONgrid environment (i.e. on one of the systems). • The implementation can benefit from core capabilities provided in GEONgrid, e.g. replication, load-balancing • Need at least a PoP node to host a service • Hosted databases will be stored at Data Nodes, but may be replicated at one or more PoP nodes • Data nodes • Require Internet2 connectivity • Will be backed up to SDSC (figuring out details) • Will be replicated among themselves (need to figure out details) GEON PI Meeting, March 21-23h, 2004, Blacksburg, VA CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Core Grid Services • Registry: • a place to register and find basic Web services. But also, all services (e.g. PGAP, Gravity Database, Seismic Simulation Tool, …) • Authentication: • using GEON Certificate Authority and Grid certificates • Initially (I.e. in 2004), use certificates only at the portal. Very few, if any, services may actually validate to Grid certificates • Access control: • investigating various systems for policy-based access to services • Data replication: • initial target is IBM GMR software for replicating files as well as databases • Support for various data systems: • e.g., SDSC Storage Resource Broker (SRB) and OpenDAP • Perhaps implement servers at Data Nodes • Job submission, e.g. launch job to TeraGrid. • Leverage NMI funding. New proposal under NMI. GEON PI Meeting, March 21-23h, 2004, Blacksburg, VA CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Higher-Order Grid Services • Registration • Data and metadata, schema, ontology, services • Important in order to support search functionality • Data Integration • Defining “views” across multiple sources • Multiple database schemas, e.g. in Chronos (Paleostrat, Neptune, Paleobiology), PAST?, Geochemisry (Navdat, PetDB, …) • Multiple maps and map layers • GIS and 2D Viz • Integrating map layers. “Simple” mapping service. • SVG-based data access and visualization tools • Workflow • Iconic representation of databases and tools • Ability to link together tools and data to specify computations • Based on Kepler system GEON PI Meeting, March 21-23h, 2004, Blacksburg, VA CYBERINFRASTRUCTURE FOR THE GEOSCIENCES GEON Portal • Exact “look and feel” and core functionality is a work in progress • Portal components for: • GeonSearch, GeoWorkbench, Rocky Mountain Testbed, Mid-Atlantic Testbed, GEONSystems, GEON Docs, EOT • Portlet-based design is meant to make it easier to create customized portals (“building blocks” approach) • E.g. Rocky Mountain and Mid-Atlantic are examples of customized portlets • Portal software is distributed to each PoP node. Can be customized at each PoP node. GEON PI Meeting, March 21-23h, 2004, Blacksburg, VA CYBERINFRASTRUCTURE FOR THE GEOSCIENCES GeonSearch • Ad hoc search versus querying of preestablished “views” • Ad hoc Search • Search/discover information on data, services, experiments, “other” (e.g., people, organizations) • Display results via map interfaces, semantic graphs • View-based querying • E.g., use ad hoc search to find a set of databases, map layers of interest; define a specific way of combining data across these various sources • Need good use cases GEON PI Meeting, March 21-23h, 2004, Blacksburg, VA CYBERINFRASTRUCTURE FOR THE GEOSCIENCES GeoWorkbench • Workbench “capabilities” • Data and service registration • Create spatial, temporal, concept-based indexes as part of registration process • Ability to define views, e.g. using GeonSearch to find data, services, etc. • Run analysis routines, e.g. via workflow specifications, using Kepler • Visualize output, save output, feed output to other services • Need good use cases • Current portal is very much a work in progress • Figuring out functional components • Which function goes under what part of the portal, etc. GEON PI Meeting, March 21-23h, 2004, Blacksburg, VA CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Knowledge Representation GEON PI Meeting, March 21-23h, 2004, Blacksburg, VA CYBERINFRASTRUCTURE FOR THE GEOSCIENCES