GEON: The User Perspective Choonhan Youn Dogan Seber, Chaitan Baru, Ashraf Memon San Diego Supercomputer Center, University of California at San Diego GEON (GEOscience Network) • A cyberinfrastructure project for geosciences funded by NSF ITR. • creating an IT infrastructure to “enable” interdisciplinary geoscience research -- not a group of researchers, but the entire community will benefit • Vision: Enable new discoveries in the geosciences by building an easy-to-use and “comprehensive” data, software, tools, and information network by utilizing state-of the-art information technology resources. Current GEON member institutions Partners Members • California Institute for Telecommunications • Arizona State University and Information Technology Cal-IT2 • Bryn Mawr College • Chronos • Penn State University • CUAHSI • Rice University • ESRI • San Diego State University • Geological Survey of Canada • San Diego Supercomputer Center / • Georeference Online University of California, San Diego • IBM • University of Arizona • Kansas Geological Survey • University of Idaho • Lawrence Livermore National Laboratory • University of Missouri, Columbia • U.S. Geological Survey (USGS) • University of Texas at El Paso • University of Utah Other Affiliates • Virginia Tech • Southern California Earthquake Center • UNAVCO, Inc. (SCEC), EarthScope, IRIS, NASA • Digital Library for Earth System Education (DLESE) GEOSCIENCE CHALLENGES • Exponential Increase in Data Volume – How to manage vast amounts of data can be used by all scientists in an easy-to-use environment • Data Storage, Access and Preservation – How to build a framework to exchange data and help preserving collected data sets • Data Integration (semantic and syntactic) – How to merge multiple geology maps to make a seamless (“integrated”) map • Computational Challenges – How to build a system that helps scientists run advance software without having access to significant resources (computers and technical), focusing on the science problem • Advance Visualization (3D/4D) – How to build a visualization system that helps scientists analyze large and complex data sets dynamically • Archiving and publications of results with reusable components (reusability) – How to preserve scientific results and help others to repeat the analysis as efficiently as possible? GEON Cyberinfrastructure (CI) Principles • CI: Support the “day to day” conduct of science (escience), in addition to “hero” computations • An equal partnership – IT works in close conjunction with science • Create shared “science infrastructure” – Integrated online databases, with advanced search and query engines – Online models, robust tools and applications • Leverage from other intersecting projects – Much commonality in the technologies, regardless of science disciplines, e.g. BIRN, SEEK, and many others Main e-Research facilities I • A Resource Registration System for Data Providers – Register ontologies (domain knowledge) and ontology articulations – Register datasets with metadata including data access information – Optionally register datasets to ontologies (which is crucial for data integration and smart search): Ontology enabled semantic integration – Shapefile, ASCII, Excel, GMT Raster, Geo TIFF, Relational Database, PDF, tool, WMS service, Web service, etc. • A Search Engine for Data Users – – – – – Metadata based search Spatial coverage based search Temporal coverage based search Concept based search Ontology based data discovering Main e-Research facilities II • The user workspace, called myGEON area. – Users are able to search and collect their data sets from the GEON search engine and integrate them. – For example, users can review and analyze "SYNSEIS“ ouputs that are generated by job running. • Computational HPC – SYNSEIS (Synthetic Seismogram toolkit) • Workflow – LiDAR: an end-to-end solution for the distribution, interpolation and analysis of LiDAR / ALSM point data. – Atype workflow: generates map for all plutonic bodies in Virginia from the VA Igneous rocks database based on the certain inputs. Constraints for main e-Research facilities • Dynamic workflow issues due to the web-based system on the GEON • Large computational clusters for simulating GEON applications as needed – GEON has three small cluster nodes on partner sites GEON Portal Usability • Easy of use – GEON Search, SYNSEIS, many of them, etc. • Make complex tasks easy to specify – LiDAR • Highly interactive – SYNSEIS • Integrated access to tools and resources – myGEON, Mapping Integration Computational HPC for SYNSEIS Lessons Learnt • Its main strengths – Standard-compliant ways – Using open source libraries and tools for most of implementations • Its main weaknesses – Highly user interactive, friendly interface issues within the portlet franework • Would you consider alternatives to a portal solution? – Currently, No Future Plan • Will add and develop new functionalities based on the requests from GEON PIs and geoscience community. • Will keep improving the portal usability. – For example, in case of SYNSEIS, add more user capabilities in the user interface for complex earthquake simulations. • Will expand its use within geoscience community internationally – Center on GEON PIs first GEON: The Developer Perspective Choonhan Youn Dogan Seber, Chaitan Baru, Ashraf Memon San Diego Supercomputer Center, University of California at San Diego Methods of GEON’s Design • Several workshops were held with participation from scientists from different disciplines like geochemistry, geophysics etc. • Also Principal Investigators (PIs) visits SDSC for focused discussion on their requirements • Prototypes are built using gathered requirements and then spiral model of software development is followed to enhance the prototype. Service-Oriented Approach Priority of Functional and non-functional requirements • Start with functional requirement from the principal investigators or local geo-science PI • Prototypes are built and functional requirements are tested • Then focus on to non-functional requirements like usability Technical Strategy • The “two-tier” approach – Use best practices, including use of commercial tools and open standards, where applicable… • start with development using the technology available now – …while developing advanced technology, and doing CS research • push for open source and best practices as much as possible GEONSearch, Registration, myGEON Portlet Client Access (via web services) User Access (via Portal) myOntology.owl metadata myDataset.foo metadata ResourceRegistration GEON Catalog Other distributed apps Kepler, DLESE, … Search condition(s) spatial temporal concept GEONsearch User actions add delete manipulate myGEON GEON Workspace (user) SRB Log GEONmiddleware external services Gazetteer, DLESE, … Geologic Age, Chronos, … SYNSEIS toolkit User Access (via Web Browser) GEONGrid Portal HTTP SYNSEIS Portlet myGEON Portlet Web Services Flash Application SOAP SAC Service Data Model Service SOAP Job Submission/Monitoring and File Service Grid Services Grid FTP Data Repository SOAP JDBC HPC Resources TeraGrid clusters Job Database Cornell Map Server SOAP Data Archives Service CORBA(IIOP) IRIS DMC Development Issues • Constraints – Interoperability issues due to use of existing tools • Use of existing tools developed in Fortran and some machine dependent algorithms and code GRASS based GIS processing. • Incompatible implementation of same standard (OGC’s WMS) – Usability requirements • Portlets UI is designed by the software developers and so they are not very user friendly – Part of our tension in the project is that • while this is an R&D project for the IT folks, the science folks want some of it to look like production software – lack of user input in some cases, • because some users are still trying to get up to speed with the IT concepts so they haven’t really used the system. Evaluation • Usually success of our GEON services is determined by user satisfaction! • Usability workshop was held recently with domain scientist involved and their feedback was taken. – Based on this report, we are working on it • Another workshop will be held after the implementation of the suggested changes. Lessons Learnt • The most successful aspects – Integrating with other grid, such as TeraGrid – Data registration, search capabilities for geoscience community – Community involvement • The least successful aspects – Community still is evaluating this system. Future Plans • Will provide a secure role-based authorization control (using SAML) to fully integrate into the GEON portal. • Will add WSRP service. • The definition of conventions for managing state may be handled through standard ways such as WSRF so that applications discover, bind, and communicate with stateful resources in standard and interoperable ways. GEON Search Portlet GEON Resource Registration Portlet User Workspace Geon Dataset Ids Map Integration Portlet (Mediator) 2. Dataset Ids to Ontology Ids 3. Ontology Ids to Ontology Names 4. Ontology Ids to Ontology Concepts Redefine Query Execute Query Ontology Service SRB Download Datasets Store Query Results Query Tracking DB Generate Map Query Result Indexing Ontology Engine Mapping Services Query Service GET_EXTRACT GET_MAP ArcIMS Webservices Dataset Ids to Dataset Names Mapping GEON Metadata Catalogue 1. Knowledge Representation Client Portlet Gridsphere Mapping Integration Portlet DATA PROCESSING(LiDAR Portlet) Client GEON Catalog x,y,z and attribute NFS Mounted Disk IBM DB2 maps/data Software Tools GEON Portal GEON Search Portlet WWW LiDAR Process Portlet Other Portlet GEON Search Service DB2 Spatial Function raw data GRASS ARCINFO process output GMT Spatial Query Service Compute Cluster LiDAR Processing Service Data Processing Algorithms TeraGrid DataStar