Space Physics Interactive Data Resource – SPIDR Mikhail ZHIZHIN (Geophysical Center Russian Acad. Sci.) Eric KIHN (National Geophysical Data Center NOAA) Dmitry MEDVEDEV (Geophysical Center Russian Acad. Sci.) Rob REDMON (National Geophysical Data Center NOAA) Dmitry MISHIN (Institute of Physics of the Earth Russian Acad. Sci.) 50 years ago – International Geophysical Year – IGY1957 World Data Center B World Data Center A Sun and space Sun and space Solid Earth Meteo Mail Meteo Solid Earth World Data Center C Total data volume ~ 1 Gb Exchange ~ 1 Mb/year Satellites Solid Earth Meteo Yesterday – databases, Internet, web – Y2K Data Resource Data Resource Data Resource Data Resource Data Resource Data Resource Data Resource Data Resource Total data volume ~ 1 Tb Exchange ~ 1 Gb/year Data Resource Tomorrow – Electronic Geophysical Year – EGY2007 Data Resource Data Resource Data Resource Data Resource GRID Data Resource Total data volume ~ 1 Pb Exchange ~ 1 Tb/year Data Resource Data Resource Data Resource SPIDR mission SPIDR is a de facto standard data source on solarterrestrial physics, functioning within the framework of the ICSU World Data Centers. It is a distributed database and application server network, built to select, visualize and model historical space weather data distributed across the Internet. SPIDR can work as a fully-functional webapplication (portal) or as a grid of web-services, providing functions for other applications to access its data holdings. SPIDR databases Currently SPIDR archives include • solar activity and solar wind data, • geomagnetic variations and indices, • ionospheric, cosmic rays, radio-telescope ground observations, • telemetry and images from NOAA, NASA, and DMSP satellites. SPIDR database clusters and portals are installed in the USA, Russia, China, Japan, Australia, South Africa, and India. SPIDR components Web Portal: Workflow, Data Ingest, Mining, Visualization and Delivery Au the ca nti te Virtual Community of Registered Users queries Find event User results Virtual Observatory Metadata SPIDR portal combines the central XML metadata repository with a set of distributed data web services and data file collections. A user can search for data using metadata inventory, use persistent data basket to save the selection for the next session, and plot or download in parallel the selected data in different formats, including XML and NetCDF. Ge t da ta Virtual Data Sources Metadata catalog of data services Selections from different data services plotted in parallel Satellite orbits navigator FTP data file repository viewer Data service: common data model serialization + URL Local user workstation Remote SPIDR server WS DataService SQL Data request SpidrClient Local filename Subsetting Datafile URL Databases Formatting Save to disk local copy of Datafile Download Datafile All grid data services in SPIDR share the same Common Data Model and compatible metadata schema. Local and/or remote data service: output data stream Local database via JDBC SPIDR Web application Service container Common Data Model JDBC Table 1 Data service SOAP AP SO SPIDR WS client SO AP SPIDR Web application Service container Remote database via Web Service Common Data Model JDBC Table 2 Data service It is possible at the same time to use a local data source with JDBC protocol and a remote data service with SOAP protocol. The type of protocol is defined by the SPIDR configuration. Data upload and synchronization: input data stream Local user workstation Loader options Remote SPIDR server WS FileService FileClient Loading log Loader Databases Parser local Filename local copy of Datafile Datafile Upload Mirror SPIDR server Sync Datafile Web Service A database administrator can upload new files into the SPIDR databases using the web services directly or through the web portal. SPIDR databases are self-synchronizing via the web services. SPIDR metadata “compromise” XML database (high level, low-granularity metadata) = Virtual Observatory (VxO) – Hierarchy of the data categories, key words, textual descriptions – Methods and credentials to access the data (web-service, ftpdirectory) – User Forum for data quality and usability support SQL database (low level, high-granularity metadata) = Data Inventory – Parameters (name, physical meaning, units of measurement, virtual formula) or database schema – Availability and accreditation of the data (inventory) – Visualization details (type of the plot and coordinate system, scales, labels) – Input-output formats High-level metadata search Low-level database inventory Different workflows and interfaces for different User groups Simplistic for novice users to be driven by Guru SPIDR usage tutorial SPIDR homepage http://spidr.ngdc.noaa.gov System administrator interface Advanced user interface Data description and help Real-time usage statisics for a given time interval User sessions per day Total ~20 000 registered users Per database requests for plot (red) and export (blue) Numerical modeling on the Grid: Space Weather Reanalysis - SWR Input: ground and satellite data from SPIDR data services Output: high-resolution rendering of the near-Earth space Space weather numerical models TIEGCM Init Conditions IMF Kp Dst 10.7 cm Flux HPI Magnetometer GOES AMIE Magnetic, Electric Potential, Etc. High Lat Elec Geostationary Magnetic Field, Kp TEC, FoF2,Neutral Winds MSM SWR DATA Particle Data SWR Computer Resources JET Supercomputer FSL/NOAA, Boulder • • • • • • 768 Intel Pentium 4 Xeon Nodes (Dual 2.2 GHz Processors) Myricom Myrinet CLOS64 (2.4 Gbs) ADIC Fileserve MSS (100 Tbytes) NGDC was the #2 JET user for 2004-2005 The SWR consumed 400,000 + CPU Hours The SWR has produced over 2.5 Tb data, this exceeds all of NGDC’s non-satellite holdings! The SWR requires a tremendous array of computer support in order to meet its goals. Challenges include sufficient CPU power, integrating distributed model runs, and storage space for input and output data sets. The SWR project makes use of shared time on FSL’s JET supercomputer as well as RAID and Tivoli based storage systems at NGDC NOAA SPIDR integration with VxO and Grid infrastructure Two reasons to move to the Grid middleware: Web Middleware: Tomcat VxO Application Layer Grid Middleware: OGSA-DAI Metadata Services DataSource Services ModelAnalysis Services XML DB ConnectionManager SPIDR ConenctionManager AMIE Model ConnectionManager nativeXML DB: eXist SQL DB cluster: MySQL Parallel-AMIE on computer cluster 1. The digital certificates for security and authentication simplify inter-site communication 2. Processing large environmental archives requires asynchronous web-services call mechanism Some conclusions • Grid (web) data services accessible from SPIDR portal and a number of clients in Java, C#, Matlab, MS Excel • Near-real time IMF, ionosphere and geomagnetic data input streams • Data accreditation, FTP file depositary synchronous with the database • Metadata service with high-level data description and low-level data inventory • Virtual Observatory and User Community functionality: forum, bookmarks, i-mail, external metadata services • Integration with Web Map Services • “Fork” of the SPIDR-based data resource on solid Earth • “Proprietary” SPIDR common data model becomes limiting, need generic like NetCDF • SPIDR as a resource on the Space Physics Grid