Common Data Format (CDF) and Coordinated Data Analysis Web

advertisement
Finding, browsing, and getting
data easily using SPDF web
services
Space Physics Data Facility
<http://spdf.gsfc.nasa.gov>
NASA Goddard Space Flight Center
Greenbelt MD 20771
CDAWeb data browser
• Handles vast heterogeneity of instruments and
parameters (>1000 datasets, 2M files, 10 TB, 8500
parameters)
• Yet simple browsing interface provides 80% of
researchers’ needs without many bells & whistles
• Based on
– standard file format (CDF)
– standardized metadata
– metadata-driven IDL software
– provider metadata over-ride by Master CDFs
CDAWeb usage statistics
CDAWeb process
• Top static web page calls Perl routines to generate selection
forms, with instrument-types, spacecraft, instruments,
datasets and available time ranges filled from text lists
(compiled each night from data CDFs, no formal database)
• Form POST calls a Perl routine to generate IDL calls to
determine data files to fill user’s request, followed by a call to
read_myCDF and then either plotmaster, list_mystruct or
write_mycdf, depending on the user’s request.
• CDAWlib IDL routines read data CDFs into IDL structures,
plot or list data, and return URLs to temporary output files
(ASCII listings, movie or static GIF files, or created sub- or
super-setted CDF files)
• CDAS SOAP and REST API calls can also query the text lists
and generate the IDL calls
Metadata (ISTP and Cluster)
• Global attributes provide mission, instrument type,
dataset info (project, source_name, data_type,
descriptor, etc.)
• Variable attributes (Catdesc, Fieldnam, var_notes)
• Depend_0 _1 _2, Delta_plus/minus_var point to
related variables
• Display_type, Var_type (data, support data)
• Fill_val, Validmin/max for ranges
• LablAxis/Label_ptr, Units/Unit_ptr for plots
• Format for listing
• Virtual variables to create variables on-the-fly
CDAWlib IDL library
<http://spdf.gsfc.nasa.gov/CDAWlib.html>
• Read_MyCDF reads CDF variables into an IDL
structure, along with support variables and
metadata
• PlotMaster determines best display for the
variables and creates plots in GIF and PS/PDF
• Assorted plot routines: time_series, spectrogram,
ionogram, radar_vector, image, orbit, stack_plot,
map, plasmagram, movie, map_movie, time_text,
etc.
Other useful CDAWlib routines
• List_MyStruct creates text listings using all available/defined
global and variable level metadata, e.g. label_axis, label_ptr1,
format, depends, units, etc.
• CDFx <http://cdaweb.gsfc.nasa.gov/cdaweb/cdfx/> IDL GUI
to read, list and plot data from CDFs located on user’s
machine, allows some customization of plotting options, e.g.
color pallets, scale ranges, etc.
• Auroral_Image maps 2-D data using map projections and
various fill and expansion techniques (generally auroral
images onto the Earth)
• Spectrogram plots a color spectrogram of a 2-dim variable in
contiguous or non-contiguous blocks
Making CDFs
• Create skeleton text file (hopefully with standard
metadata) by hand or easier with SKTeditor
<http://SSCweb.gsfc.nasa.gov/skteditor/>
• Convert skeleton file to empty CDF with SkeletonCDF (or
automatically in SKTeditor)
• Add data with IDL CDAWlib routines or use MakeCDF Clanguage tool to put ASCII and binary data into a CDF file
<http://spdf.gsfc.nasa.gov/makecdf.html>
• ISTP/IACG Guidelines for recommended naming of
datasets and filenames, global and variable attributes
<http://spdf.gsfc.nasa.gov/sp_use_of_cdf.html>
Making CDFs with CDAWlib
• Create empty CDF with SKTeditor or manually
• Read CDF structure into an IDL structure with
read_master_cdf.pro
• Fill IDL structure data fields with data
• Create new CDF file with the contents of the filled
structure with write_data_to_cdf.pro (both in
IDLmakecdf.pro)
CDAWeb-resident data direct
to IDL (Beta)
• <http://cdaweb.gsfc.nasa.gov/WebServices/REST/
CdasIdlLibrary.html>
• d = spdfgetdata('AC_K2_MFI', ['Magnitude' ,
'BGSEc'], ['2009-06-01T00:00:00.000Z', '2009-0603T00:00:00.000Z'])
• IDL GUI interface to CDAWeb-held data:
spdfcdawebchooser
CDAWeb Lessons
• Standard data file format and metadata enable
powerful and extendible services
• Code is metadata-driven (from the data files
themselves), rather than coding lots of special
cases (with attendant high maintenance)
• Variable type and dimensions is often enough to
determine desirable plot format, but can be overridden by metadata
• Metadata from Master (no data) CDFs over-rides
metadata (or lack of) in data CDFs
SPDF Web Services
•
Satellite Situation Center (SSC)
–
–
•
SOAP RPC-encoded since 2002
SOAP document-literal since 2007
Coordinated Data Analysis System (CDAS)
–
–
SOAP RPC-encoded since 2003
REST since 2010
Web Service Styles
• Remote Procedure Call (RPC)
– Tightly coupled
– Early SOAP, CORBA, DCOM, RMI
• Service-Oriented Architecture (SOA)
– Loosely coupled
– Later (message/document) SOAP
• Resource-Oriented Architecture (ROA)
– Loosely coupled
– Representational State Transfer (REST)
SOAP Web Services
• Popular before REST
• Supported by most major software vendors
• Criticized for being complex (but simple to
use with advanced tools/frameworks)
• Focuses on “message-oriented” services
• Supported in most popular programming
environments (Java, .NET, PHP, Python,
Perl, etc.)
REST Web Services
• Focus on interacting with stateless resources using wellknown HTTP standard operations (GET, POST, PUT,
DELETE, etc.)
• Simpler protocol requires simpler libraries/frameworks
– Available in more programming environments
• Details less defined than SOAP (e.g., messages can be
plain text instead of complex XML conforming to SOAP
specifications)
– Different services can be very different
– “RESTful Web Services”, Leonard Richardson & Sam
Ruby, 2007 O'Reilly Media, Inc.
REST vs SOAP
• CDAWeb REST sits on top of CDAS SOAP
• Trades: ugly URLs but easy to understand,
use in browser
• SOAP easy to call from Java with SOAP
libraries
• Opendap, FTP, HTTP, TSDS
More SOAP vs REST
• IDL interface much easier with REST rather than
convincing scientists to install the Java bridge and our
library along with the IDL code
• Finally and most important, we can create a system where
spacecraft, instruments, datasets, and time periods are
treated as objects and can be referenced in the event list
server and in papers/reports
• SOAP is opaque and more difficult to explain and to use
• Can describe a REST service in WSDL, particularly
version 2 and also in WADL
<http://research.sun.com/spotlight/2006/2006-04-24-TR153.html>,
although WSDL alone doesn’t always work for SOAP
either.
CDAWeb REST Complications
• Metadata is complicated, requiring REST
interface to return XML/JSON
• Results are complicated, requiring REST
interface to return XML/JSON
• Resources don’t conform to a simple
hierarchical structure, requiring REST to
support a POST (with XML request) for
some resources in addition to the simpler
GET method
• Calling CDAWeb SOAP from IDL
– Use Java Bridge (Java's SOAP) and CDAWeb-specific JAR
(automatically produced from WSDL)
• Newer versions of IDL include/configure Java during installation
– 1 JAR to add to classpath
– IDL call to CDAWeb:
• dataviews = cdas->getAllViewDescriptions()
• Calling CDAWeb REST from IDL
– Use IDLnetURL and IDLffXMLDOM
– Requires 1000s of lines* of hand-written, CDAWeb-specific
data-binding and serialization/deserialization code (that uses
IDLffXMLDOM for marshalling/unmarshalling)
– IDL call to CDAWeb:
• dataviews = cdas->getDataviews()
CDAS REST IDL Library: > 4518 LOC
(including comment and blank lines)
Component
Total
spdfAuthenticator
spdfCdawebChooser
spdfCdawebChooserAuthenticator
spdfGetData
spdfHttpErrorDialog
spdfHttpErrorReporter
spdfWsExample
Current Library
Missing pieces
Complete Library
LOC
6502
-199
-980
-116
-155
-100
-97
-337
4518
+?
>4518
• Calling CDAWeb REST from Java
– JAXB (Java API for XML Binding) can
produce the thousands of lines of code
required by IDL (from the CDAS.xsd schema)
– Only requires a little more code (to
send/receive HTTP) than SOAP.
• Conclusion:
– Calling REST is only a little more work than
SOAP if the client environment has something
like JAXB (which IDL doesn’t) or if someone
else writes the extra code.
SOAP Lesson 1 Applied To REST
• Cannot reduce metadata to simple array of
string values
– That is, metadata is more complicated
(structured) and simplification makes it less
useful
– This resulted in the REST interface returning
XML/JSON instead of something simpler
SOAP Lesson 2 Applied To REST
• CDAWeb results (plot, listing, CDF) are too
complicated to be returned in a HTTP entity body
– A single request can produce multiple image files
– A result may have many message, status, warning, and
error entities associated with it
– Results may be thumbnail images that require special
processing to obtain expanded images
• REST interface returns XML/JSON that contains
URL of actual result (plot, listing, CDF)
CDAWeb Resource URI Design
• Attempted to design URI that was meaningful,
well structured, and used path variables when
possible
• /dataviews/{dataview}/datasets/{dataset}/data/{st
art-time},{stop-time}/{var1},{varn}?format=...
• Cannot represent multi-dataset (with dataset
associated variables) resources with a single
path
• Multi-dataset resources must be requested with a
POST containing an XML description of request
Backups
Abstract
The NASA GSFC Space Physics Data Facility (SPDF) provides
heliophysics science-enabling information services for enhancing
scientific research and enabling integration of these services into the
Heliophysics Data Environment paradigm, via SOAP and REST web
services in addition to web browser, FTP, and OPeNDAP interfaces.
We describe these interfaces and the philosophies behind these web
services, and show how to call them from various languages, such as
IDL and Perl. We are working towards a "one simple line to call"
philosophy extolled in the recent VxO discussions. Combining data
from many instruments and missions enables broad research analysis
and correlation and coordination with other experiments and missions.
Coordinated Data Analysis Web (CDAWeb)
<http://cdaweb.gsfc.nasa.gov>
•Data browsing system provides plotting, listing and
open access via FTP, HTTP, and web services (REST,
SOAP, OPeNDAP) for data from most NASA
Heliophysics missions
•Combining data from many instruments and missions
enables broad research analysis and correlation and
coordination with other experiments and missions
•Collecting and making data usable biggest effort but
most important
•Space weather, planetary studies, in situ/remote data
Common Data Format (CDF)
<http://cdf.gsfc.nasa.gov>
• Standard self-describing multidimensional data format
• Platform- and discipline-independent
• Associated scientific data management package (“CDF
Library”) makes actual data format completely
transparent to the user and accessible through a
consistent set of interface routines
• IDL and Matlab routines
• Also callable by Fortran, C, C#, Perl, Java
• Open source
• Internal compression, checksums
Format Translator
• CDF project also maintains software and
services for translating between many
standard formats (CDF, netCDF, HDF, FITS,
XML)
<http://cdf.gsfc.nasa.gov/html/dtws.html>
IDL CDF Interface
Creating CDFs: CDF_CREATE, CDF_VARCREATE,
CDF_ATTPUT, CDF_VARPUT, CDF_CLOSE
Reading CDFs: CDF_OPEN, CDF_INQUIRE,
CDF_CONTROL, CDF_VARINQ, CDF_VARGET,
CDF_ATTINQ, CDF_ATTGET, CDF_CLOSE
Info: CDF_CONTROL, CDF_COMPRESSION,
CDF_DOC, CDF_INQUIRE, CDF_LIB_INFO
Time: CDF_ENCODE_EPOCH, CDF_EPOCH,
CDF_PARSE_EPOCH, CDF_EPOCH_DIFF,
CDF_EPOCH_COMPARE
Other: CDF_SET_MD5CHECKSUM,
CDF_SET_CDF27_BACKWARD_COMPATIBLE
Download