Space Physics Interactive Data Resource – SPIDR 4

advertisement
Space Physics Interactive
Data Resource – SPIDR
Mikhail ZHIZHIN (Geophysical Center Russian Acad. Sci.)
Eric KIHN (National Geophysical Data Center NOAA)
Dmitry MEDVEDEV (Geophysical Center Russian Acad. Sci.)
Rob REDMON (National Geophysical Data Center NOAA)
Dmitry MISHIN (Institute of Physics of the Earth Russian Acad. Sci.)
50 years ago – International
Geophysical Year – IGY1957
World Data
Center B
World Data
Center A
Sun and
space
Sun and
space
Solid
Earth
Meteo
Mail
Meteo
Solid Earth
World Data
Center C
Total data
volume ~ 1 Gb
Exchange ~ 1 Mb/year
Satellites
Solid
Earth
Meteo
Yesterday – databases, Internet,
web – Y2K
Data
Resource
Data
Resource
Data
Resource
Data
Resource
Data
Resource
Data
Resource
Data
Resource
Data
Resource
Total data
volume ~ 1 Tb
Exchange ~ 1 Gb/year
Data
Resource
Tomorrow – Electronic
Geophysical Year – EGY2007
Data
Resource
Data
Resource
Data
Resource
Data
Resource
GRID
Data
Resource
Total data
volume ~ 1 Pb
Exchange ~ 1 Tb/year
Data
Resource
Data
Resource
Data
Resource
SPIDR mission
SPIDR is a de facto standard data source on solarterrestrial physics, functioning within the
framework of the ICSU World Data Centers.
It is a distributed database and application server
network, built to select, visualize and model
historical space weather data distributed across
the Internet.
SPIDR can work as a fully-functional webapplication (portal) or as a grid of web-services,
providing functions for other applications to
access its data holdings.
SPIDR databases
Currently SPIDR archives include
• solar activity and solar wind data,
• geomagnetic variations and indices,
• ionospheric, cosmic rays, radio-telescope
ground observations,
• telemetry and images from NOAA, NASA, and
DMSP satellites.
SPIDR database clusters and portals are installed
in the USA, Russia, China, Japan, Australia,
South Africa, and India.
SPIDR components
Web Portal:
Workflow, Data Ingest, Mining,
Visualization and Delivery
Au
the
ca
nti
te
Virtual Community of
Registered Users
queries
Find event
User
results
Virtual Observatory
Metadata
SPIDR portal combines the central XML metadata
repository with a set of distributed data web services
and data file collections. A user can search for data
using metadata inventory, use persistent data
basket to save the selection for the next session,
and plot or download in parallel the selected data in
different formats, including XML and NetCDF.
Ge
t
da
ta
Virtual
Data Sources
Metadata catalog of data services
Selections from different data services
plotted in parallel
Satellite orbits navigator
FTP data file repository viewer
Data service: common data model
serialization + URL
Local user
workstation
Remote SPIDR
server
WS DataService
SQL
Data request
SpidrClient
Local
filename
Subsetting
Datafile URL
Databases
Formatting
Save to disk
local
copy of
Datafile
Download
Datafile
All grid data services in SPIDR share the same Common Data Model and
compatible metadata schema.
Local and/or remote data service:
output data stream
Local database
via JDBC
SPIDR
Web application
Service
container
Common
Data
Model
JDBC
Table 1
Data
service
SOAP
AP
SO
SPIDR WS
client
SO
AP
SPIDR
Web application
Service
container
Remote database
via Web Service
Common
Data
Model
JDBC
Table 2
Data
service
It is possible at the same time to use a local data source with JDBC
protocol and a remote data service with SOAP protocol. The type of
protocol is defined by the SPIDR configuration.
Data upload and synchronization:
input data stream
Local user
workstation
Loader options
Remote SPIDR
server
WS FileService
FileClient
Loading log
Loader
Databases
Parser
local
Filename
local
copy of
Datafile
Datafile
Upload
Mirror SPIDR
server
Sync
Datafile
Web Service
A database administrator can upload new files into the SPIDR databases
using the web services directly or through the web portal. SPIDR databases
are self-synchronizing via the web services.
SPIDR metadata “compromise”
XML database (high level, low-granularity metadata) =
Virtual Observatory (VxO)
– Hierarchy of the data categories, key words, textual descriptions
– Methods and credentials to access the data (web-service, ftpdirectory)
– User Forum for data quality and usability support
SQL database (low level, high-granularity metadata) = Data
Inventory
– Parameters (name, physical meaning, units of measurement,
virtual formula) or database schema
– Availability and accreditation of the data (inventory)
– Visualization details (type of the plot and coordinate system,
scales, labels)
– Input-output formats
High-level metadata search
Low-level database inventory
Different workflows and interfaces for
different User groups
Simplistic for novice users
to be driven by Guru
SPIDR usage tutorial
SPIDR homepage
http://spidr.ngdc.noaa.gov
System administrator
interface
Advanced user interface
Data description
and help
Real-time usage statisics
for a given time interval
User sessions
per day
Total ~20 000
registered users
Per database
requests for plot
(red) and export
(blue)
Numerical modeling on the Grid:
Space Weather Reanalysis - SWR
Input: ground and satellite data
from SPIDR data services
Output: high-resolution
rendering of the near-Earth
space
Space weather
numerical
models
TIEGCM
Init Conditions
IMF
Kp
Dst
10.7 cm Flux
HPI
Magnetometer
GOES
AMIE
Magnetic, Electric Potential, Etc.
High Lat Elec
Geostationary Magnetic Field, Kp
TEC, FoF2,Neutral Winds
MSM
SWR
DATA
Particle Data
SWR Computer Resources
JET Supercomputer
FSL/NOAA, Boulder
•
•
•
•
•
•
768 Intel Pentium 4 Xeon Nodes
(Dual 2.2 GHz Processors)
Myricom Myrinet CLOS64 (2.4
Gbs)
ADIC Fileserve MSS (100 Tbytes)
NGDC was the #2 JET user for
2004-2005
The SWR consumed 400,000 +
CPU Hours
The SWR has produced over 2.5
Tb data, this exceeds all of
NGDC’s non-satellite holdings!
The SWR requires a tremendous array of computer support in
order to meet its goals. Challenges include sufficient CPU
power, integrating distributed model runs, and storage space for
input and output data sets. The SWR project makes use of
shared time on FSL’s JET supercomputer as well as RAID and
Tivoli based storage systems at NGDC NOAA
SPIDR integration with VxO and
Grid infrastructure
Two reasons to move to the
Grid middleware:
Web Middleware: Tomcat
VxO Application Layer
Grid Middleware: OGSA-DAI
Metadata Services
DataSource Services
ModelAnalysis Services
XML DB
ConnectionManager
SPIDR
ConenctionManager
AMIE Model
ConnectionManager
nativeXML DB:
eXist
SQL DB cluster:
MySQL
Parallel-AMIE
on computer cluster
1. The digital certificates
for security and
authentication simplify
inter-site communication
2. Processing large
environmental archives
requires asynchronous
web-services call
mechanism
Some conclusions
• Grid (web) data services accessible from SPIDR portal
and a number of clients in Java, C#, Matlab, MS Excel
• Near-real time IMF, ionosphere and geomagnetic data
input streams
• Data accreditation, FTP file depositary synchronous with
the database
• Metadata service with high-level data description and
low-level data inventory
• Virtual Observatory and User Community functionality:
forum, bookmarks, i-mail, external metadata services
• Integration with Web Map Services
• “Fork” of the SPIDR-based data resource on solid Earth
• “Proprietary” SPIDR common data model becomes
limiting, need generic like NetCDF
• SPIDR as a resource on the Space Physics Grid
Download