Data

advertisement
Giovanni for Air Quality:
Recent development
G. Leptoukh
NASA Goddard Space Flight Center
Goddard Interactive Online
Visualization ANd aNalysis
Infrastructure (Giovanni)
• With a few mouse clicks, easily obtain information on the
atmosphere, ocean and land around the world.
• No need to learn data formats to retrieve &process data.
• Try various parameter combinations measured by
different instruments.
• All the statistical analysis is done via a regular web
browser.
http://giovanni.gsfc.nasa.gov/
Caution: Giovanni is a rapidly evolving data exploration tool!
The Old Way:
Giovanni Allows Scientists to
Concentrate on the Science
PreScience
Find data
Retrieve high
volume data
Learn formats
and develop readers
Extract
parameters
Perform spatial
and other subsetting
Identify quality and other
flags and constraints
Perform
filtering/masking
Develop analysis
and visualization
Accept/discard/get more data
(sat, model, ground-based)
DO
SCIENCE
Exploration
Initial Analysis
Use the best data for
the final analysis
Derive conclusions
Jan
Feb
Submit the paper
Read Data
Extract Parameter
Mar
Subset Spatially
Filter Quality
Apr
May
Jun
Reformat
Visualize
Explore
Analyze
Aug
Sep
Oct
The Giovanni Way:
Minutes
Days for
exploratio
n
Use the best data for
the final analysis
DO
Derive conclusions
Write the paper
SCIENCE
Submit the paper
Reproject
Jul
Write the paper
Jan 2012
Web-based Services:
GES DISC tools allow
scientists to compress the
time needed for pre-science
preliminary tasks:
data discovery, access,
manipulation, visualization,
and basic statistical analysis.
Scientists have more time to do science.
Leptoukh, EPA @UMBC
3
Example: Comprehensive Multi-Sensor Data
Environment for Aerosol Studies
Missions
Terra
Instruments
MISR
Models
GOCART
Ground-based
AERONET
MODIS
Aqua
Aura
Parasol
CALIPSO
….
OMI
Polder
CALIOP
…
US EPA
PM2.5
(DataFed)
Giovanni now
• More than 40 customized Giovanni portals serving
various missions and projects
• ~ 1500 geophysical parameters/variables
• Data (local and remote via FTP, OPeNDAP, WCS) from:
o ~ 20 space-based instruments
o ~ 50 models
o EPA and Aeronet stations
• Multiple visualization and statistical analysis
functionalities including data intercomparison
• Data lineage
• Subsetted data downloads in multiple formats
• Various maps and plots served via WMS protocol
• Serving output data via WCS, KML
Generation 3 Giovanni: Old but
improved with more parameters
(mostly using Level 3 data)
Giovanni 3 (G3) portals
Giovanni Inventory
Jan 2012
Leptoukh, EPA @UMBC
7
Air Quality Giovanni portal
Jan 2012
Leptoukh, EPA @UMBC
8
Air Quality Multi-Sensor, Model, and
Ground-Based Data Support via Giovanni
Multi-sensor, model, and ground-based data support with
Air Quality Giovanni
PM2.5 (EPA DataFed  Giovanni
Jan 2012
Deep Blue MODIS Aerosol Optical Depth
The standard MODIS AOT
Leptoukh, EPA @UMBC
GOCART AOT
9
Wildfire Visualization
Visualizing California’s Wildfires from Space Using GIOVANNI
23-27 October 2007
Data from NASA’s Aura OMI (Tropospheric NO2 and UV Aerosol Index), Aqua AIRS (Total Column CO) and
Terra MODIS (Aerosol Small Fraction, Cloud Optical Thickness and Aerosol Mass Concentration Over Land)
Tropospheric
NO2
OMI
Aerosol Small
Mode Fraction
MODIS
December 2007
Jan12
2012
UV Aerosol
Index
OMI
Cloud Optical
Thickness
MODIS
Gregory LeptoukhLeptoukh,
2007EPA
Fall@UMBC
AGU Meeting
Total Column
CO
AIRS
Aerosol Mass
over Land
MODIS
San Francisco, CA
10 10
Model intercomparison
HTAP Giovanni supports the
Hemispheric Transport of
Air Pollution (HTAP) Model
Intercomparison.
There is potential to
expand it for comparison
with additional remote
sensing data sets.
Jan 2012
Leptoukh, EPA @UMBC
11
Giovanni Applications Projects
5/6/2009
Intro instances data aerosols A-Train examples applications quality future
Different levels of multi-sensor activities
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Archiving data from multiple sensors. Done.
Harmonizing metadata. Done… more or less.
Accessing data from remote locations. Done
Harmonizing data formats for joint processing (Giovanni). Done.
Serving multi-sensor data via common protocols. Done.
Scale harmonization (Giovanni) – regridding. Done (horizontal only)
Harmonizing visualization (Giovanni, ACP). Done.
Joint analysis (Giovanni). Done and ongoing.
Merging similar parameters (Giovanni). Prototype done for Level 3.
Harmonizing quality. Working on it.
Harmonizing provenance (Measures, Giovanni, MDSA). Started.
Adjusting bias using Neural Network approach. Done.
Merging L2 data. Done
Fusing complementary geophysical variables. Future.
Jan 2012
Leptoukh, EPA @UMBC
13
Giovanni Data sources and their access protocols
Data sources
NASA GES DISC
Protocol
Local access
Data
AIRS, TRMM, OMI, MLS, HIRDLS
NASA MODIS DAAC
FTP
MODIS
NASA Ocean Color DAAC
FTP
SeaWiFS, MODIS
NASA Langley DAAC
OPeNDAP
CALIPSO, MISR, TES, CERES
NSIDC
FTP
AMSR-E
NOAA
FTP
Snow, Ice, NCEP
Univ. of Maryland
FTP
MODIS fire, NDVI
Colorado State Univ.
FTP
CloudSat
CIESIN Columbia University
FTP
Population data
JPL
FTP
QuickSat
EPA via DataFed
WCS
PM2.5
Lille, France
FTP
Parasol
ESA
FTP
MERIS
FTP  WCS
HTAP
Juelich, Germany
DLR, Germany
WCS
Paris, France
OPeNDAP
Jan 2012
GOME-2
AEROCOM
Leptoukh, EPA @UMBC
14
Peer-reviewed publications using and
acknowledging Giovanni (as of Nov 1, 2011)
450
400
Publication number
350
300
250
200
150
100
50
0
Series1
Jan 2012
2004
3
2005
7
2006
6
2007
27
2008
50
2009
86
Leptoukh, EPA @UMBC
2010
115
2011
115
Total
409
15
Evolving Giovanni infrastructure
Agile Giovanni (G4)
Giovanni-3
• Harmonized
data &
G1 & G2
inventory
• Independent • Separate
instances
instances
• 1998 • Configurator
• 2005 -
•Flexible infrastructure
•Modular
•Fully interoperable
•URL-based
•Data types
•L2 swath/profiles
•Point data
•2009 -
What is AeroStat?
• AeroStat is a new NASA Giovanni (generation 4)
online visualization and statistical portal.
• It is an online environment for the direct
statistical intercomparison of global aerosol
parameters, in which the provenance and data
quality can be easily accessed by scientists.
• AeroStat also provides a collaborative research
environment where scientists can seamlessly share
AeroStat workflow execution, algorithms, best
practices, known errors and other pertinent
information with the science community.
Jan 2012
Leptoukh, EPA @UMBC
18
Motivation
• Different papers suggested different views on the
quality of MODIS and MISR aerosol products.
• Peer-reviewed papers are usually behind the latest
version of the data.
• Difficult to verify/reproduce results from various
published paper
• Difficult to combine consistently adjusted
measurements
• In need of an online shareable environment where
data processing and analysis can be done in a
transparent way by any user of this environment and
can be shared amongst all the members of the aerosol
community.
Jan 2012
Leptoukh, EPA @UMBC
19
What can AeroStat do?
• Provide an effective tool for comparing satellite
and ground-based aerosol Level 2 data
• Provide an environment for colocation and
comparison methods with detailed documentation
• Provide aerosol bias adjustment to satellite data
based on ground-based measurement
• Explore aerosol phenomena by merging multisensor data
• Enable easy sharing of results
Jan 2012
Leptoukh, EPA @UMBC
20
Goals
• Provide an easy-to-use collaborative environment
for exploring aerosol phenomena using multisensor data
• Provide consistent services with multi-sensor
aerosol data
• Provide a transparent environment to colocation
and comparison methods with detail
documentation
• Provide easy sharing of results
Jan 2012
Leptoukh, EPA @UMBC
21
AeroStat Giovanni Architectural Diagram
Jan 2012
Leptoukh, EPA @UMBC
22
AeroStat Giovanni Data
Support data from Level 2 Measurements
• Original Level 2 Products
– AERONET Level 2
– MODIS Terra Level 2
– MODIS Aqua Level 2
– MISR Level 2
• Derived Products
– Satellite Colocated with AERONET
stations: MAPSS Database
– Cross Satellite Colocations: Near-neighbor
search algorithm (being integrated)
Jan 2012
Leptoukh, EPA @UMBC
23
AeroStat Giovanni Architectural Diagram
Jan 2012
Leptoukh, EPA @UMBC
24
Data Quality and No. of pixels Filters
Defaults: Science Team recommended filters
Jan 2012
Leptoukh, EPA @UMBC
25
AeroStat Giovanni Architectural Diagram
Jan 2012
Leptoukh, EPA @UMBC
26
Bias Adjustment Using Neural Network
Jan 2012
Leptoukh, EPA @UMBC
27
AeroStat Giovanni Architectural Diagram
Jan 2012
Leptoukh, EPA @UMBC
28
AeroStat Giovanni Services
• Data colocated with AERONET
– Time Series with QA filtering and bias adjustment
options
– Scatter Plot with QA filtering and bias adjustment
options
• Colocated satellite data only (cross satellite)
– Gridded Lat-Lon Maps for individual parameters
with QA filtering and bias adjustment options
– Merged Multi-sensor gridded Lat-Lon Maps with QA
filtering and bias adjustment options
Jan 2012
Leptoukh, EPA @UMBC
29
Time Series (with various filters)
Jan 2012
Leptoukh, EPA @UMBC
30
Scatter Plots
Bias adjustment
No filters
Jan 2012
Default
(Science Team
recommended)
filters
Leptoukh, EPA @UMBC
31
Gridded Maps
Jan 2012
Leptoukh, EPA @UMBC
32
Level 2 data merged
Jan 2012
Leptoukh, EPA @UMBC
33
AeroStat Giovanni Architectural Diagram
Jan 2012
Leptoukh, EPA @UMBC
34
Gsocial (Giovanni Social Network)
Gsocial participants can
save results, annotate plots,
share with others, reproduce
their and others’ results (!),
and continue sharing.
Jan 2012
Leptoukh, EPA @UMBC
35
Research notebook
Jan 2012
Leptoukh, EPA @UMBC
36
Science Application Example
Jan 2012
Leptoukh, EPA @UMBC
37
Jan 2012
Leptoukh, EPA @UMBC
38
Final Touchups before the AeroStat
public release
• Add bias adjustment for MISR colocated with
AERONET data (ran into some version mixture at
Langley)
• Add bias (to Aeronet) adjustment for “Satellite
Only” service
• Routinely process and ingest for colocated satellite
and AERONET data
• Fold AeroStat Giovanni into mainstream Giovanni
• Add features based on users feedback (e.g., log-log
for scatter plots)
Jan 2012
Leptoukh, EPA @UMBC
39
Remote-sensing Data Quality
Why so much attention to Data
Quality now?
• In the past, it was difficult to access satellite data.
• Now, within minutes, a user can find and access multiple
datasets from various remotely located archives via web
services and perform a quick analysis.
• This is the so-called Data Intensive Science.
• The new challenge is to quickly figure out which of those
multiple and easily accessible data are more appropriate for
a particular use.
• However, our remote sensing data are not ready for this
challenge – there is no consistent approach for characterizing
quality of our data.
• This is why data quality is hot now.
Jan 2012
Leptoukh, EPA @UMBC
41
Why so difficult?
• Quality is perceived differently by data providers and
data recipients.
• Many different qualitative and quantitative aspects
of quality.
• No comprehensive framework for remote sensing
Level 2 and higher data quality
• No preferred methodologies for solving many data
quality issues
• Data quality aspect had lower priority than building an
instrument, launching a rocket, collecting/processing
data, and publishing a paper using these data.
• Each science team handled quality differently.
Jan 2012
Leptoukh, EPA @UMBC
42
Expectations for Data Quality
• What do users want?
– Gridded non-gappy data with error bars in each
grid cell
• What do they get instead?
– Level 2 swath in satellite projection with some
obscure quality flags that mean nothing to users
– Level 3 monthly data with a lot of aggregation (not
always clearly described) and standard deviation
as an uncertainty measure (fallacy)
Jan 2012
Leptoukh, EPA @UMBC
43
Different perspectives
Data providers: demigods looking from above
We have good data
MISR
We have good data
MODIS
MLS
TES
I need good new data … and quickly. A new data product
could be very good, but if it is not being conveniently
served and described, it is not good for me…
So I am going to use whatever I have and know already.
Jan 2012
Leptoukh, EPA @UMBC
OMI
User
44
Data provider vs. User perspective
• Algorithm developers and Data providers: solid
science + validation
• Users: fitness for purpose
– Measuring Climate Change:
• Model validation: gridded contiguous data with uncertainties
• Long-term time series: bias assessment is the must , especially
sensor degradation, orbit and spatial sampling change
– Studying phenomena using multi-sensor data:
• Cross-sensor bias is needed
– Realizing Societal Benefits through Applications:
• Near-Real Time for transport/event monitoring - in some cases,
coverage and timeliness might be more important that accuracy
– Educational (generally not well-versed in the intricacies of
quality) – only the best products
Jan 2012
Leptoukh, EPA @UMBC
45
Different kinds of reported and perceived
data quality
• Pixel-level Quality (reported): algorithmic guess at usability
of data point (some say it reflects the algorithm “happiness”)
– Granule-level Quality: statistical roll-up of Pixel-level Quality
• Product-level Quality (wanted/perceived): how closely the
data represent the actual geophysical state
• Record-level Quality: how consistent and reliable the data
record is across generations of measurements
Different quality types are often erroneously assumed having the same
meaning
Different focus and action at these different levels to ensure Data Quality
Jan 2012
Leptoukh, EPA @UMBC
46
General Level 2 Pixel-Level Issues
• How to extrapolate validation knowledge about selected
Level 2 pixels to the Level 2 (swath) product?
• How to harmonize terms and methods for pixel-level
quality?
AIRS
MODIS Aerosols Confidence
Quality Indicators
Flags
Ocean
0 Best
Data Assimilation
1 Good
Climatic Studies
2 Do Not Use
3 Very Good
2 Good
1 Marginal
0 Bad
Land
3 Very Good
2 Good
1 Marginal
0 Bad
Purpose
Match up the recommendations?
Use these flags in order to stay
within expected error bounds
Ocean
±0.03 ± 0.10 t
Jan 2012
Leptoukh, EPA @UMBC
Land
±0.05 ± 0.15 t
47
Data Quality vs. Quality of Service
• A data product could very good,
• But if not being conveniently served and
described, is perceived as not being so good…
User perspective:
• There might be a better product somewhere but if
I cannot easily find it and understand it, I am
going to use whatever I have and know already.
Jan 2012
Leptoukh, EPA @UMBC
48
QI: Spatial completeness (coverage):
Aerosol Optical Depth (AOD)
MODIS Aqua
MISR
Spatial coverage (%) for different latitudinal zones and seasons:
Due to a wider swath, MODIS AOD covers more area than MISR.
The seasonal and zonal patterns are rather similar
Jan 2012
Leptoukh, EPA @UMBC
49
Data provider quality indicators vs. user QI
• EPA requirements for air pollution:
– Very specific Quality Indicators, e.g., PM2.5
concentration
• Satellite-measured aerosols are characterized
by aerosol scientists. Aerosol Optical Depth
(AOD) is not the same as PM2.5
• Are these quality indicators compatible?
• Can one be mapped to another?
• Does very accurate AOD measurement
correspond to accurate PM2.5? Usually not…
Jan 2012
Leptoukh, EPA @UMBC
50
Multi-Sensor Data Synergy Advisor (MDSA)
Expand Giovanni to include semantic web ontology system that
captures scientist knowledge & data quality characteristics, and to
encode this knowledge so the Advisor can assist user in multi-sensor
data analysis.
Identify and present the caveats for comparisons.
Funding : ESTO
Same Parameter
Same Location and Time
Different Provenance
Different Results
Importance of capturing and using provenance
Quality presentation: QI
• Table contains links to explanatory web pages
• Processes organized as a list
Note: Data provenance and quality presentation should be tailored to the audience
From Multi-Sensor Data Synergy Advisor (MDSA) project
Jan 2012
Leptoukh, EPA @UMBC
52
Sources of data quality information
What do we want to get from the documentation?
The known quality facts about a product presented in a structured way, so
humans (and computers) can easily extract this information + links to data.
• Algorithm Theoretical Basis Documents (ATBD):
–
–
–
–
More or less structured
Usually out-of-date
Represent the algorithm developer perspective
Describe quality control flags but does not address the product quality aspects
• Regular papers:
– To be published, a paper has to have something new, e.g., new methodology, new angle,
new result. Therefore, by design, all papers are different to avoid rejection
– Results are presented differently (usually without links for reliable data access)
– Structured for publication in a specific journal – not standardized
– Version of the data not always obvious while findings about the old version data usually
are not applicable to the newest version
Recommendation:
– Establish a standard (maybe even a journal) for validation papers with links to data
Jan 2012
Leptoukh, EPA @UMBC
53
Harmonization
To be able to compare and/or merge data from
multiple sources, we need to harmonize:
• Quality Control flags
• Provenance
• Bias adjustment
….not addressed in this presentation to save time…
Jan 2012
Leptoukh, EPA @UMBC
54
Conclusion
• More parameters and data sources added to
the traditional (generation 3) Giovanni
• New generation 4 Giovanni for transparent
assessment of aerosol data quality and
monitoring of aerosol transport  AeroStat
• Quality of remote sensing data addressed in
various activities – some reflected in Giovanni
Jan 2012
Leptoukh, EPA @UMBC
55
Download