NERIES Data Portal for Seismology: Brainstorm Meeting Edinburgh, UK November 6-7, 2008 EarthScope Portal and IRIS Web Service Development Robert Casey IRIS Data Management Center Seattle, WA Objectives • Provide a central search and data access capability for distributed EarthScope data and data products – IRIS (USArray) – seismic, strainmeter, MT – UNAVCO (PBO) – GPS, tiltmeter – ICDP (SAFOD) – drilling logs, core data Objectives • Distributed implementation – A central Web service invokes Web services at each location, which search the local catalogs – SOAP-based interaction – Common Query Interface schema – Station Location, Identification, and Product Association – Product Listing Returns – Product Packaging and Delivery Objectives • Distributed development – Software design and development by a distributed team of developers at SDSC, UNAVCO, IRIS, and ICDP – Leverage portal work already done for GEON (gridsphere) – Independent code developed to a jointly agreed-upon schema – Regular teleconferences, occasional travel for technical discussions – Project planning with regular milestones coordinated at SDSC. Development Timeframe • Development period – 15 months • First Demo available at AGU Dec 2007. • Alpha Testing - April 2008 • Beta Testing - July 2008 • Release Candidate – Sept 2008 • Final Deployment – Oct 2008 Deployment Sites • Portal Work in Progress sited at San Diego Supercomputer Center • Alpha and Beta releases to select group of testers • Feedback and issue tracking in JIRA • Final release candidate code ported to permanent siting at UNAVCO in Boulder, CO. The EarthScope Portal – opening page Architecture Backend Backend Backend Zoom and Bounding Box Station Selection Cluster Stations Cluster Selection Cluster Zoom Search Toolbar Station Selection on Map Selected Stations List Find Data Search Results Select Desired Products Fill the Data Cart Package Cart Data Package is Ready Package Details Download Package Query History Current Issues • Large number of stations. Slow to render so many when opening page. – Use WMS layers? – Other fast-rendering strategy? – Level of Detail variations – only selectable stations at zoom? Current Issues • Temporal Constraints. How do we allow user to have wide discovery but narrow access to data? – User cannot browse through years of data – User has to guess at when stations have data available Current Issues • Large Return sets. How do we present data to the user which spans large geographic, time, sensor type, and sample rate dimensions? – Continuous sampling of data – Multiple channels per site – Different measurements at site Current Issues • Query flexibility vs complexity 9Current interface is standard time/location/product, but… o Drilling data has a depth dependency for sensors o Some products may cover an area, not a point source o Each product type can have multiple data types Current Issues • Query result browsing and filtering. – Too many results, requires pagination and cutoff limits – Allowing sorting by field, especially when results are truncated by the server – Tree categorization does not allow breadth-wise filters across categories Portlets • Examined use of Portlet Remoting (WSRP) – technology considered not mature at the time (2007) • SDSC already had experience with Gridsphere, provided portlet container API • Deemed too risky to have each component site develop custom portlets – time/cost Moving on… IRIS DMC web services strategy Current Plan • Preserve current CORBA access technologies (DHI) • Make use of existing tools and legacy software • Create an underlying data layer for locating and fetching data and metadata, common to CORBA and web services Current Plan • Create a web services access layer in tandem with improved DHI • Both web services and DHI will access the same data layer interfaces Diagram Web Service Layers • Potentially three layers of service module composition 1. A high level abstract workflow interface 2. A more detailed programmatic SOAP interface 3. A detailed protocol buffer interface (does the heavy lifting) Abstract Workflow Layer • Most abstract of the service layers • Will need to interact with lower layers for ‘refinement’ of workflow • May go with a commercial vendor SOAP interface • This layer will be suitable for programmatic access by SOAP 2.0 clients • Current technologies: Axis 2 (SOAP stack) and JiBX (data binding) • Investigating SOAP header messaging for intermediary processing nodes Protocol Buffer Layer • Covers all service modules, many not publicly accessible • Presents fine-grained decoupled functions for data fetching and step-wise processing • Protocol buffer carries data and messages between functions (Google) Stateful Tracking • Investigating use of a database for asynchronous persistent awareness of workflow state • Will track intermediate steps and products to allow provenance tracking and efficient repeat processing at the refinement stage First steps • • • • EarthScope Portal SPADE product catalog Phase Pick Query Ground Motion and Decimation services • Waveform fetching and creation of plot image Sample work in progress References Memon, A., C. Baru, K. Behrends, R. Casey, B. Hoyt, L. Kamb, K. Lin, B. Weertman, C. Weiland. The EarthScope Data Portal. Presented at Geoinformatics 2008, Potsdam. http://gi2008.gfz-potsdam.de/ Baru, C., T. Ahern, G. Anderson, K. Behrends, R. Casey, B. Hoyt, L. Kamb, K. Lin, C. Meertens, A. Memon, J. Muench, C. Stolte, B. Weertman, C. Weiland (2007), The Earthscope Portal, Eos Trans. AGU, 88(52), Fall Meet. Suppl., Abstract IN44A-08 Weertman, B., J. Muench, L. Kamb, R. Casey, T. Ahern. 2007. Emerging Web Services at the IRIS Data [Management Center]. Presented at the Geoinformatics 2007 Conference, Geological Society of America, University of California, 18 May 2007. http://gsa.confex.com/gsa/2007GE/finalprogram/abstract_122255.htm Muench, J., L. Kamb, R. Casey, T. Ahern. 2006. Opening Doors for Seismic Data Access. Presented at the Geoinformatics 2006 Conference, USGS, Reston, Virginia, 10-12 May 2006.