Finding, browsing, and getting data easily using SPDF web services Space Physics Data Facility <http://spdf.gsfc.nasa.gov> NASA Goddard Space Flight Center Greenbelt MD 20771 CDAWeb data browser • Handles vast heterogeneity of instruments and parameters (>1000 datasets, 2M files, 10 TB, 8500 parameters) • Yet simple browsing interface provides 80% of researchers’ needs without many bells & whistles • Based on – standard file format (CDF) – standardized metadata – metadata-driven IDL software – provider metadata over-ride by Master CDFs CDAWeb usage statistics CDAWeb process • Top static web page calls Perl routines to generate selection forms, with instrument-types, spacecraft, instruments, datasets and available time ranges filled from text lists (compiled each night from data CDFs, no formal database) • Form POST calls a Perl routine to generate IDL calls to determine data files to fill user’s request, followed by a call to read_myCDF and then either plotmaster, list_mystruct or write_mycdf, depending on the user’s request. • CDAWlib IDL routines read data CDFs into IDL structures, plot or list data, and return URLs to temporary output files (ASCII listings, movie or static GIF files, or created sub- or super-setted CDF files) • CDAS SOAP and REST API calls can also query the text lists and generate the IDL calls Metadata (ISTP and Cluster) • Global attributes provide mission, instrument type, dataset info (project, source_name, data_type, descriptor, etc.) • Variable attributes (Catdesc, Fieldnam, var_notes) • Depend_0 _1 _2, Delta_plus/minus_var point to related variables • Display_type, Var_type (data, support data) • Fill_val, Validmin/max for ranges • LablAxis/Label_ptr, Units/Unit_ptr for plots • Format for listing • Virtual variables to create variables on-the-fly CDAWlib IDL library <http://spdf.gsfc.nasa.gov/CDAWlib.html> • Read_MyCDF reads CDF variables into an IDL structure, along with support variables and metadata • PlotMaster determines best display for the variables and creates plots in GIF and PS/PDF • Assorted plot routines: time_series, spectrogram, ionogram, radar_vector, image, orbit, stack_plot, map, plasmagram, movie, map_movie, time_text, etc. Other useful CDAWlib routines • List_MyStruct creates text listings using all available/defined global and variable level metadata, e.g. label_axis, label_ptr1, format, depends, units, etc. • CDFx <http://cdaweb.gsfc.nasa.gov/cdaweb/cdfx/> IDL GUI to read, list and plot data from CDFs located on user’s machine, allows some customization of plotting options, e.g. color pallets, scale ranges, etc. • Auroral_Image maps 2-D data using map projections and various fill and expansion techniques (generally auroral images onto the Earth) • Spectrogram plots a color spectrogram of a 2-dim variable in contiguous or non-contiguous blocks Making CDFs • Create skeleton text file (hopefully with standard metadata) by hand or easier with SKTeditor <http://SSCweb.gsfc.nasa.gov/skteditor/> • Convert skeleton file to empty CDF with SkeletonCDF (or automatically in SKTeditor) • Add data with IDL CDAWlib routines or use MakeCDF Clanguage tool to put ASCII and binary data into a CDF file <http://spdf.gsfc.nasa.gov/makecdf.html> • ISTP/IACG Guidelines for recommended naming of datasets and filenames, global and variable attributes <http://spdf.gsfc.nasa.gov/sp_use_of_cdf.html> Making CDFs with CDAWlib • Create empty CDF with SKTeditor or manually • Read CDF structure into an IDL structure with read_master_cdf.pro • Fill IDL structure data fields with data • Create new CDF file with the contents of the filled structure with write_data_to_cdf.pro (both in IDLmakecdf.pro) CDAWeb-resident data direct to IDL (Beta) • <http://cdaweb.gsfc.nasa.gov/WebServices/REST/ CdasIdlLibrary.html> • d = spdfgetdata('AC_K2_MFI', ['Magnitude' , 'BGSEc'], ['2009-06-01T00:00:00.000Z', '2009-0603T00:00:00.000Z']) • IDL GUI interface to CDAWeb-held data: spdfcdawebchooser CDAWeb Lessons • Standard data file format and metadata enable powerful and extendible services • Code is metadata-driven (from the data files themselves), rather than coding lots of special cases (with attendant high maintenance) • Variable type and dimensions is often enough to determine desirable plot format, but can be overridden by metadata • Metadata from Master (no data) CDFs over-rides metadata (or lack of) in data CDFs SPDF Web Services • Satellite Situation Center (SSC) – – • SOAP RPC-encoded since 2002 SOAP document-literal since 2007 Coordinated Data Analysis System (CDAS) – – SOAP RPC-encoded since 2003 REST since 2010 Web Service Styles • Remote Procedure Call (RPC) – Tightly coupled – Early SOAP, CORBA, DCOM, RMI • Service-Oriented Architecture (SOA) – Loosely coupled – Later (message/document) SOAP • Resource-Oriented Architecture (ROA) – Loosely coupled – Representational State Transfer (REST) SOAP Web Services • Popular before REST • Supported by most major software vendors • Criticized for being complex (but simple to use with advanced tools/frameworks) • Focuses on “message-oriented” services • Supported in most popular programming environments (Java, .NET, PHP, Python, Perl, etc.) REST Web Services • Focus on interacting with stateless resources using wellknown HTTP standard operations (GET, POST, PUT, DELETE, etc.) • Simpler protocol requires simpler libraries/frameworks – Available in more programming environments • Details less defined than SOAP (e.g., messages can be plain text instead of complex XML conforming to SOAP specifications) – Different services can be very different – “RESTful Web Services”, Leonard Richardson & Sam Ruby, 2007 O'Reilly Media, Inc. REST vs SOAP • CDAWeb REST sits on top of CDAS SOAP • Trades: ugly URLs but easy to understand, use in browser • SOAP easy to call from Java with SOAP libraries • Opendap, FTP, HTTP, TSDS More SOAP vs REST • IDL interface much easier with REST rather than convincing scientists to install the Java bridge and our library along with the IDL code • Finally and most important, we can create a system where spacecraft, instruments, datasets, and time periods are treated as objects and can be referenced in the event list server and in papers/reports • SOAP is opaque and more difficult to explain and to use • Can describe a REST service in WSDL, particularly version 2 and also in WADL <http://research.sun.com/spotlight/2006/2006-04-24-TR153.html>, although WSDL alone doesn’t always work for SOAP either. CDAWeb REST Complications • Metadata is complicated, requiring REST interface to return XML/JSON • Results are complicated, requiring REST interface to return XML/JSON • Resources don’t conform to a simple hierarchical structure, requiring REST to support a POST (with XML request) for some resources in addition to the simpler GET method • Calling CDAWeb SOAP from IDL – Use Java Bridge (Java's SOAP) and CDAWeb-specific JAR (automatically produced from WSDL) • Newer versions of IDL include/configure Java during installation – 1 JAR to add to classpath – IDL call to CDAWeb: • dataviews = cdas->getAllViewDescriptions() • Calling CDAWeb REST from IDL – Use IDLnetURL and IDLffXMLDOM – Requires 1000s of lines* of hand-written, CDAWeb-specific data-binding and serialization/deserialization code (that uses IDLffXMLDOM for marshalling/unmarshalling) – IDL call to CDAWeb: • dataviews = cdas->getDataviews() CDAS REST IDL Library: > 4518 LOC (including comment and blank lines) Component Total spdfAuthenticator spdfCdawebChooser spdfCdawebChooserAuthenticator spdfGetData spdfHttpErrorDialog spdfHttpErrorReporter spdfWsExample Current Library Missing pieces Complete Library LOC 6502 -199 -980 -116 -155 -100 -97 -337 4518 +? >4518 • Calling CDAWeb REST from Java – JAXB (Java API for XML Binding) can produce the thousands of lines of code required by IDL (from the CDAS.xsd schema) – Only requires a little more code (to send/receive HTTP) than SOAP. • Conclusion: – Calling REST is only a little more work than SOAP if the client environment has something like JAXB (which IDL doesn’t) or if someone else writes the extra code. SOAP Lesson 1 Applied To REST • Cannot reduce metadata to simple array of string values – That is, metadata is more complicated (structured) and simplification makes it less useful – This resulted in the REST interface returning XML/JSON instead of something simpler SOAP Lesson 2 Applied To REST • CDAWeb results (plot, listing, CDF) are too complicated to be returned in a HTTP entity body – A single request can produce multiple image files – A result may have many message, status, warning, and error entities associated with it – Results may be thumbnail images that require special processing to obtain expanded images • REST interface returns XML/JSON that contains URL of actual result (plot, listing, CDF) CDAWeb Resource URI Design • Attempted to design URI that was meaningful, well structured, and used path variables when possible • /dataviews/{dataview}/datasets/{dataset}/data/{st art-time},{stop-time}/{var1},{varn}?format=... • Cannot represent multi-dataset (with dataset associated variables) resources with a single path • Multi-dataset resources must be requested with a POST containing an XML description of request Backups Abstract The NASA GSFC Space Physics Data Facility (SPDF) provides heliophysics science-enabling information services for enhancing scientific research and enabling integration of these services into the Heliophysics Data Environment paradigm, via SOAP and REST web services in addition to web browser, FTP, and OPeNDAP interfaces. We describe these interfaces and the philosophies behind these web services, and show how to call them from various languages, such as IDL and Perl. We are working towards a "one simple line to call" philosophy extolled in the recent VxO discussions. Combining data from many instruments and missions enables broad research analysis and correlation and coordination with other experiments and missions. Coordinated Data Analysis Web (CDAWeb) <http://cdaweb.gsfc.nasa.gov> •Data browsing system provides plotting, listing and open access via FTP, HTTP, and web services (REST, SOAP, OPeNDAP) for data from most NASA Heliophysics missions •Combining data from many instruments and missions enables broad research analysis and correlation and coordination with other experiments and missions •Collecting and making data usable biggest effort but most important •Space weather, planetary studies, in situ/remote data Common Data Format (CDF) <http://cdf.gsfc.nasa.gov> • Standard self-describing multidimensional data format • Platform- and discipline-independent • Associated scientific data management package (“CDF Library”) makes actual data format completely transparent to the user and accessible through a consistent set of interface routines • IDL and Matlab routines • Also callable by Fortran, C, C#, Perl, Java • Open source • Internal compression, checksums Format Translator • CDF project also maintains software and services for translating between many standard formats (CDF, netCDF, HDF, FITS, XML) <http://cdf.gsfc.nasa.gov/html/dtws.html> IDL CDF Interface Creating CDFs: CDF_CREATE, CDF_VARCREATE, CDF_ATTPUT, CDF_VARPUT, CDF_CLOSE Reading CDFs: CDF_OPEN, CDF_INQUIRE, CDF_CONTROL, CDF_VARINQ, CDF_VARGET, CDF_ATTINQ, CDF_ATTGET, CDF_CLOSE Info: CDF_CONTROL, CDF_COMPRESSION, CDF_DOC, CDF_INQUIRE, CDF_LIB_INFO Time: CDF_ENCODE_EPOCH, CDF_EPOCH, CDF_PARSE_EPOCH, CDF_EPOCH_DIFF, CDF_EPOCH_COMPARE Other: CDF_SET_MD5CHECKSUM, CDF_SET_CDF27_BACKWARD_COMPATIBLE