Giovanni for Air Quality: Recent development G. Leptoukh NASA Goddard Space Flight Center Goddard Interactive Online Visualization ANd aNalysis Infrastructure (Giovanni) • With a few mouse clicks, easily obtain information on the atmosphere, ocean and land around the world. • No need to learn data formats to retrieve &process data. • Try various parameter combinations measured by different instruments. • All the statistical analysis is done via a regular web browser. http://giovanni.gsfc.nasa.gov/ Caution: Giovanni is a rapidly evolving data exploration tool! The Old Way: Giovanni Allows Scientists to Concentrate on the Science PreScience Find data Retrieve high volume data Learn formats and develop readers Extract parameters Perform spatial and other subsetting Identify quality and other flags and constraints Perform filtering/masking Develop analysis and visualization Accept/discard/get more data (sat, model, ground-based) DO SCIENCE Exploration Initial Analysis Use the best data for the final analysis Derive conclusions Jan Feb Submit the paper Read Data Extract Parameter Mar Subset Spatially Filter Quality Apr May Jun Reformat Visualize Explore Analyze Aug Sep Oct The Giovanni Way: Minutes Days for exploratio n Use the best data for the final analysis DO Derive conclusions Write the paper SCIENCE Submit the paper Reproject Jul Write the paper Jan 2012 Web-based Services: GES DISC tools allow scientists to compress the time needed for pre-science preliminary tasks: data discovery, access, manipulation, visualization, and basic statistical analysis. Scientists have more time to do science. Leptoukh, EPA @UMBC 3 Example: Comprehensive Multi-Sensor Data Environment for Aerosol Studies Missions Terra Instruments MISR Models GOCART Ground-based AERONET MODIS Aqua Aura Parasol CALIPSO …. OMI Polder CALIOP … US EPA PM2.5 (DataFed) Giovanni now • More than 40 customized Giovanni portals serving various missions and projects • ~ 1500 geophysical parameters/variables • Data (local and remote via FTP, OPeNDAP, WCS) from: o ~ 20 space-based instruments o ~ 50 models o EPA and Aeronet stations • Multiple visualization and statistical analysis functionalities including data intercomparison • Data lineage • Subsetted data downloads in multiple formats • Various maps and plots served via WMS protocol • Serving output data via WCS, KML Generation 3 Giovanni: Old but improved with more parameters (mostly using Level 3 data) Giovanni 3 (G3) portals Giovanni Inventory Jan 2012 Leptoukh, EPA @UMBC 7 Air Quality Giovanni portal Jan 2012 Leptoukh, EPA @UMBC 8 Air Quality Multi-Sensor, Model, and Ground-Based Data Support via Giovanni Multi-sensor, model, and ground-based data support with Air Quality Giovanni PM2.5 (EPA DataFed Giovanni Jan 2012 Deep Blue MODIS Aerosol Optical Depth The standard MODIS AOT Leptoukh, EPA @UMBC GOCART AOT 9 Wildfire Visualization Visualizing California’s Wildfires from Space Using GIOVANNI 23-27 October 2007 Data from NASA’s Aura OMI (Tropospheric NO2 and UV Aerosol Index), Aqua AIRS (Total Column CO) and Terra MODIS (Aerosol Small Fraction, Cloud Optical Thickness and Aerosol Mass Concentration Over Land) Tropospheric NO2 OMI Aerosol Small Mode Fraction MODIS December 2007 Jan12 2012 UV Aerosol Index OMI Cloud Optical Thickness MODIS Gregory LeptoukhLeptoukh, 2007EPA Fall@UMBC AGU Meeting Total Column CO AIRS Aerosol Mass over Land MODIS San Francisco, CA 10 10 Model intercomparison HTAP Giovanni supports the Hemispheric Transport of Air Pollution (HTAP) Model Intercomparison. There is potential to expand it for comparison with additional remote sensing data sets. Jan 2012 Leptoukh, EPA @UMBC 11 Giovanni Applications Projects 5/6/2009 Intro instances data aerosols A-Train examples applications quality future Different levels of multi-sensor activities • • • • • • • • • • • • • • Archiving data from multiple sensors. Done. Harmonizing metadata. Done… more or less. Accessing data from remote locations. Done Harmonizing data formats for joint processing (Giovanni). Done. Serving multi-sensor data via common protocols. Done. Scale harmonization (Giovanni) – regridding. Done (horizontal only) Harmonizing visualization (Giovanni, ACP). Done. Joint analysis (Giovanni). Done and ongoing. Merging similar parameters (Giovanni). Prototype done for Level 3. Harmonizing quality. Working on it. Harmonizing provenance (Measures, Giovanni, MDSA). Started. Adjusting bias using Neural Network approach. Done. Merging L2 data. Done Fusing complementary geophysical variables. Future. Jan 2012 Leptoukh, EPA @UMBC 13 Giovanni Data sources and their access protocols Data sources NASA GES DISC Protocol Local access Data AIRS, TRMM, OMI, MLS, HIRDLS NASA MODIS DAAC FTP MODIS NASA Ocean Color DAAC FTP SeaWiFS, MODIS NASA Langley DAAC OPeNDAP CALIPSO, MISR, TES, CERES NSIDC FTP AMSR-E NOAA FTP Snow, Ice, NCEP Univ. of Maryland FTP MODIS fire, NDVI Colorado State Univ. FTP CloudSat CIESIN Columbia University FTP Population data JPL FTP QuickSat EPA via DataFed WCS PM2.5 Lille, France FTP Parasol ESA FTP MERIS FTP WCS HTAP Juelich, Germany DLR, Germany WCS Paris, France OPeNDAP Jan 2012 GOME-2 AEROCOM Leptoukh, EPA @UMBC 14 Peer-reviewed publications using and acknowledging Giovanni (as of Nov 1, 2011) 450 400 Publication number 350 300 250 200 150 100 50 0 Series1 Jan 2012 2004 3 2005 7 2006 6 2007 27 2008 50 2009 86 Leptoukh, EPA @UMBC 2010 115 2011 115 Total 409 15 Evolving Giovanni infrastructure Agile Giovanni (G4) Giovanni-3 • Harmonized data & G1 & G2 inventory • Independent • Separate instances instances • 1998 • Configurator • 2005 - •Flexible infrastructure •Modular •Fully interoperable •URL-based •Data types •L2 swath/profiles •Point data •2009 - What is AeroStat? • AeroStat is a new NASA Giovanni (generation 4) online visualization and statistical portal. • It is an online environment for the direct statistical intercomparison of global aerosol parameters, in which the provenance and data quality can be easily accessed by scientists. • AeroStat also provides a collaborative research environment where scientists can seamlessly share AeroStat workflow execution, algorithms, best practices, known errors and other pertinent information with the science community. Jan 2012 Leptoukh, EPA @UMBC 18 Motivation • Different papers suggested different views on the quality of MODIS and MISR aerosol products. • Peer-reviewed papers are usually behind the latest version of the data. • Difficult to verify/reproduce results from various published paper • Difficult to combine consistently adjusted measurements • In need of an online shareable environment where data processing and analysis can be done in a transparent way by any user of this environment and can be shared amongst all the members of the aerosol community. Jan 2012 Leptoukh, EPA @UMBC 19 What can AeroStat do? • Provide an effective tool for comparing satellite and ground-based aerosol Level 2 data • Provide an environment for colocation and comparison methods with detailed documentation • Provide aerosol bias adjustment to satellite data based on ground-based measurement • Explore aerosol phenomena by merging multisensor data • Enable easy sharing of results Jan 2012 Leptoukh, EPA @UMBC 20 Goals • Provide an easy-to-use collaborative environment for exploring aerosol phenomena using multisensor data • Provide consistent services with multi-sensor aerosol data • Provide a transparent environment to colocation and comparison methods with detail documentation • Provide easy sharing of results Jan 2012 Leptoukh, EPA @UMBC 21 AeroStat Giovanni Architectural Diagram Jan 2012 Leptoukh, EPA @UMBC 22 AeroStat Giovanni Data Support data from Level 2 Measurements • Original Level 2 Products – AERONET Level 2 – MODIS Terra Level 2 – MODIS Aqua Level 2 – MISR Level 2 • Derived Products – Satellite Colocated with AERONET stations: MAPSS Database – Cross Satellite Colocations: Near-neighbor search algorithm (being integrated) Jan 2012 Leptoukh, EPA @UMBC 23 AeroStat Giovanni Architectural Diagram Jan 2012 Leptoukh, EPA @UMBC 24 Data Quality and No. of pixels Filters Defaults: Science Team recommended filters Jan 2012 Leptoukh, EPA @UMBC 25 AeroStat Giovanni Architectural Diagram Jan 2012 Leptoukh, EPA @UMBC 26 Bias Adjustment Using Neural Network Jan 2012 Leptoukh, EPA @UMBC 27 AeroStat Giovanni Architectural Diagram Jan 2012 Leptoukh, EPA @UMBC 28 AeroStat Giovanni Services • Data colocated with AERONET – Time Series with QA filtering and bias adjustment options – Scatter Plot with QA filtering and bias adjustment options • Colocated satellite data only (cross satellite) – Gridded Lat-Lon Maps for individual parameters with QA filtering and bias adjustment options – Merged Multi-sensor gridded Lat-Lon Maps with QA filtering and bias adjustment options Jan 2012 Leptoukh, EPA @UMBC 29 Time Series (with various filters) Jan 2012 Leptoukh, EPA @UMBC 30 Scatter Plots Bias adjustment No filters Jan 2012 Default (Science Team recommended) filters Leptoukh, EPA @UMBC 31 Gridded Maps Jan 2012 Leptoukh, EPA @UMBC 32 Level 2 data merged Jan 2012 Leptoukh, EPA @UMBC 33 AeroStat Giovanni Architectural Diagram Jan 2012 Leptoukh, EPA @UMBC 34 Gsocial (Giovanni Social Network) Gsocial participants can save results, annotate plots, share with others, reproduce their and others’ results (!), and continue sharing. Jan 2012 Leptoukh, EPA @UMBC 35 Research notebook Jan 2012 Leptoukh, EPA @UMBC 36 Science Application Example Jan 2012 Leptoukh, EPA @UMBC 37 Jan 2012 Leptoukh, EPA @UMBC 38 Final Touchups before the AeroStat public release • Add bias adjustment for MISR colocated with AERONET data (ran into some version mixture at Langley) • Add bias (to Aeronet) adjustment for “Satellite Only” service • Routinely process and ingest for colocated satellite and AERONET data • Fold AeroStat Giovanni into mainstream Giovanni • Add features based on users feedback (e.g., log-log for scatter plots) Jan 2012 Leptoukh, EPA @UMBC 39 Remote-sensing Data Quality Why so much attention to Data Quality now? • In the past, it was difficult to access satellite data. • Now, within minutes, a user can find and access multiple datasets from various remotely located archives via web services and perform a quick analysis. • This is the so-called Data Intensive Science. • The new challenge is to quickly figure out which of those multiple and easily accessible data are more appropriate for a particular use. • However, our remote sensing data are not ready for this challenge – there is no consistent approach for characterizing quality of our data. • This is why data quality is hot now. Jan 2012 Leptoukh, EPA @UMBC 41 Why so difficult? • Quality is perceived differently by data providers and data recipients. • Many different qualitative and quantitative aspects of quality. • No comprehensive framework for remote sensing Level 2 and higher data quality • No preferred methodologies for solving many data quality issues • Data quality aspect had lower priority than building an instrument, launching a rocket, collecting/processing data, and publishing a paper using these data. • Each science team handled quality differently. Jan 2012 Leptoukh, EPA @UMBC 42 Expectations for Data Quality • What do users want? – Gridded non-gappy data with error bars in each grid cell • What do they get instead? – Level 2 swath in satellite projection with some obscure quality flags that mean nothing to users – Level 3 monthly data with a lot of aggregation (not always clearly described) and standard deviation as an uncertainty measure (fallacy) Jan 2012 Leptoukh, EPA @UMBC 43 Different perspectives Data providers: demigods looking from above We have good data MISR We have good data MODIS MLS TES I need good new data … and quickly. A new data product could be very good, but if it is not being conveniently served and described, it is not good for me… So I am going to use whatever I have and know already. Jan 2012 Leptoukh, EPA @UMBC OMI User 44 Data provider vs. User perspective • Algorithm developers and Data providers: solid science + validation • Users: fitness for purpose – Measuring Climate Change: • Model validation: gridded contiguous data with uncertainties • Long-term time series: bias assessment is the must , especially sensor degradation, orbit and spatial sampling change – Studying phenomena using multi-sensor data: • Cross-sensor bias is needed – Realizing Societal Benefits through Applications: • Near-Real Time for transport/event monitoring - in some cases, coverage and timeliness might be more important that accuracy – Educational (generally not well-versed in the intricacies of quality) – only the best products Jan 2012 Leptoukh, EPA @UMBC 45 Different kinds of reported and perceived data quality • Pixel-level Quality (reported): algorithmic guess at usability of data point (some say it reflects the algorithm “happiness”) – Granule-level Quality: statistical roll-up of Pixel-level Quality • Product-level Quality (wanted/perceived): how closely the data represent the actual geophysical state • Record-level Quality: how consistent and reliable the data record is across generations of measurements Different quality types are often erroneously assumed having the same meaning Different focus and action at these different levels to ensure Data Quality Jan 2012 Leptoukh, EPA @UMBC 46 General Level 2 Pixel-Level Issues • How to extrapolate validation knowledge about selected Level 2 pixels to the Level 2 (swath) product? • How to harmonize terms and methods for pixel-level quality? AIRS MODIS Aerosols Confidence Quality Indicators Flags Ocean 0 Best Data Assimilation 1 Good Climatic Studies 2 Do Not Use 3 Very Good 2 Good 1 Marginal 0 Bad Land 3 Very Good 2 Good 1 Marginal 0 Bad Purpose Match up the recommendations? Use these flags in order to stay within expected error bounds Ocean ±0.03 ± 0.10 t Jan 2012 Leptoukh, EPA @UMBC Land ±0.05 ± 0.15 t 47 Data Quality vs. Quality of Service • A data product could very good, • But if not being conveniently served and described, is perceived as not being so good… User perspective: • There might be a better product somewhere but if I cannot easily find it and understand it, I am going to use whatever I have and know already. Jan 2012 Leptoukh, EPA @UMBC 48 QI: Spatial completeness (coverage): Aerosol Optical Depth (AOD) MODIS Aqua MISR Spatial coverage (%) for different latitudinal zones and seasons: Due to a wider swath, MODIS AOD covers more area than MISR. The seasonal and zonal patterns are rather similar Jan 2012 Leptoukh, EPA @UMBC 49 Data provider quality indicators vs. user QI • EPA requirements for air pollution: – Very specific Quality Indicators, e.g., PM2.5 concentration • Satellite-measured aerosols are characterized by aerosol scientists. Aerosol Optical Depth (AOD) is not the same as PM2.5 • Are these quality indicators compatible? • Can one be mapped to another? • Does very accurate AOD measurement correspond to accurate PM2.5? Usually not… Jan 2012 Leptoukh, EPA @UMBC 50 Multi-Sensor Data Synergy Advisor (MDSA) Expand Giovanni to include semantic web ontology system that captures scientist knowledge & data quality characteristics, and to encode this knowledge so the Advisor can assist user in multi-sensor data analysis. Identify and present the caveats for comparisons. Funding : ESTO Same Parameter Same Location and Time Different Provenance Different Results Importance of capturing and using provenance Quality presentation: QI • Table contains links to explanatory web pages • Processes organized as a list Note: Data provenance and quality presentation should be tailored to the audience From Multi-Sensor Data Synergy Advisor (MDSA) project Jan 2012 Leptoukh, EPA @UMBC 52 Sources of data quality information What do we want to get from the documentation? The known quality facts about a product presented in a structured way, so humans (and computers) can easily extract this information + links to data. • Algorithm Theoretical Basis Documents (ATBD): – – – – More or less structured Usually out-of-date Represent the algorithm developer perspective Describe quality control flags but does not address the product quality aspects • Regular papers: – To be published, a paper has to have something new, e.g., new methodology, new angle, new result. Therefore, by design, all papers are different to avoid rejection – Results are presented differently (usually without links for reliable data access) – Structured for publication in a specific journal – not standardized – Version of the data not always obvious while findings about the old version data usually are not applicable to the newest version Recommendation: – Establish a standard (maybe even a journal) for validation papers with links to data Jan 2012 Leptoukh, EPA @UMBC 53 Harmonization To be able to compare and/or merge data from multiple sources, we need to harmonize: • Quality Control flags • Provenance • Bias adjustment ….not addressed in this presentation to save time… Jan 2012 Leptoukh, EPA @UMBC 54 Conclusion • More parameters and data sources added to the traditional (generation 3) Giovanni • New generation 4 Giovanni for transparent assessment of aerosol data quality and monitoring of aerosol transport AeroStat • Quality of remote sensing data addressed in various activities – some reflected in Giovanni Jan 2012 Leptoukh, EPA @UMBC 55