Shawn McClure - VIEWS - Visibility Information Exchange Web System

advertisement
3SAQS Technical Workshop
October 31 – November 1, 2013
Data Warehouse
Status and Planning Update
Zac Adelman (UNC-IE)
Shawn McClure (CSU-CIRA)
Tom Moore (WGA-WRAP)
Summary of Past Quarter Activities
•
Researched and experimented with large data transfer technologies (iRODS, Globus Connect, etc.)
•
Configured a large dual RAID array on the primary file server (~20TB) and designed a third RAID array to bring
the total storage capacity to 50TB+
•
Imported the WestJump source data files onto the primary file server and organized them into a uniform
folder structure (meteorology, emissions, results)
•
Created an FTP site on the primary file server for facilitating direct, basic access to the source data files
•
Made available the current inventory of source data files on the new FTP site
•
Began the design of the content, format, and coding protocols for submitting model results and other data to
the TSDW
•
Began the design of the schema and code infrastructure for the “project overview and tracking” system
•
Continued to refine the database, software, and website infrastructure supporting the data warehouse
•
Continued to refine various pre-processing components
•
XML Generator for metadata
•
Boundary Conditions Generator
•
CAMx Post Processing Utility
•
RDBMS data import system
•
Refined the logical and physical file system design
•
Refined the data verification and validation system
Operational Website Components
•
User Login Form
•
User Registration/Modification Form
•
User Profile/Account Form
•
User Feedback Form
•
Dataset Request Form
•
Database Query Wizard
•
Raw Data Download
•
Interactive Charts
•
Dynamic Contour Maps
•
Site Metadata Reports
•
Monitoring Site Metadata Browser
•
File Explorer
•
FTP site
Authentication and
Authorization System
Possible Future Website Components
• Modeled Emissions Summary Tool
• Modeled-to-Observed Data Comparison Tool
• Air Quality Summary Reports
o Visibility
o Deposition
o Ozone
o Other
• Model Data Mapping Tool
• Source Apportionment Tool
• Various Unpublished Monitoring Data Tools
• Backend Web Services and Processing Components
Summary of Coming Activities
• Conduct additional use case tests
• Finalize the large data transfer system
• Import preexisting/legacy air quality studies and results
• Commence production-level data warehouse operations
(hosting, data analysis and processing, maintenance, et
cetera)
• Design visualization and analysis tools for modeling results
and performance evaluation
• Design the “project overview and tracking” interface for
the TSDW website
TSDW Architecture Diagram - Overview
IRMA
NPS
Standard API
TSDW Website
NRIS
AQS
BLM
USFS
EPA
States
JSON
Standard
HTTP API
OGC
XML
TSDW FTP Site
Web Services
Other Data Systems
Data Services
Users and Providers
External TSDW Interface
Air Quality-Specific Software Libraries
Third Party Software Libraries
TSDW Software Libraries
Generalized Software Libraries
Data Access Layer
TSDW Data Management
Data Files
RDBMS
Spatial DB
Data Acquisition and Import System
Source Data
TSDW Data Flow Diagram - Overview
Meteorological Inputs
Emissions Inventories
Source Categories
Data Sources
Weather Observations
·
·
·
·
Landuse/Landcover
Initial Conditions
Physics Options
·
·
·
·
·
Point & Area Sources
Oil and Gas
Biogenic
Fire (anthro, natural)
· etc
State & Local Agencies
EPA
Mexico
Canada
etc
Model Inputs
Monitoring Data
Land Use & Cover
3SAQS
Boundary Conditions
AQS
Initial Conditions
Photolysis Rates
IMPROVE
CASTNet
Data Provider Processing
Model-Ready Processing
Model-Ready Processing
Model-Ready Processing
(e.g. reformatting, regridding)
(e.g. reformatting, regridding)
(e.g. reformatting, regridding)
Meteorological Models
Emissions Modeling (e.g. SMOKE)
(e.g. WRF, MM5)
Met Data Processing
BEIS
(e.g. MCIP2)
MOVES
Three State Data Warehouse
File Server
Model-Ready Input Data
Database Server
Gridded Model Results
Air Quality Modelers
Photochemical Grid Modeling
CMAQ
CAMx
Web Services
DBMS-Ready Model Results
Website
Products, Reports, and Analyses
Planners, Stakeholders, and Users
Oil and Gas Permits
Recommendations
TSDW Use Cases
Definition of "Use Case": A list of steps defining the interactions between a user and a
system to achieve a specific goal. The "user" can be a human or an external system,
depending on context.
Scopes of Use Cases: The subset of users to which the functionality of a given use case is
made available
• Internal: The TSDW administration and development team
• External: A subset of external users that have been granted a specific role
• Public: The general public - anyone who visits the TSDW website
Potential User Roles:
•
•
•
•
•
•
•
Administrators
Project Managers
Project Team Members
Stakeholders
Data Providers
Planners
Public
Use Case Description
Obtain and Manage Model Input Data (Scope: Internal)
1. Obtain model input data from data provider(s)
2. Copy model input data files to file server
3. Organize model input data on the file server
a. File and folder naming convention
b. Physical file system organization (what developers see)
c. Logical file system organization (what the user sees)
d. Dataset partitioning (temporal, spatial, functional, etc.)
4. Perform periodic backup of "active" model input data
5. Perform periodic archival of "inactive" model input data
6. Track and manage the versioning of the model input data
Use Case Description
Harvest File Metadata Using the XML Metadata Generator (Scope: Internal)
1. An administrator locates the desired root folder in the file system
2. An administrator executes the XML Generator program to produce XML files
containing file metadata
3. (Ideally, the above two tasks could be automatically run as a "cron" task on a regular,
periodic basis, rather than as a two-step manual process.)
4. The File Indexing Utility (FIU) processes the newly-generated files to extract the
relevant file metadata
5. The FIU updates the RDBMS with the file metadata
6. The new file metadata is automatically reflected in the TSDW File Explorer Tool
Dependencies:
·
·
·
The XML File Metadata Generator program
The File Indexing Utility (FIU)
The appropriate RDBMS schema, SQL scripts, and software libraries for managing
source file metadata
Use Case Description
Download Model Input Data from TSDW, Online Method (Scope: External)
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
User logs into the TSDW website
User fills out the Dataset Request form
The user is redirected to the Dataset Request confirmation message/page
The DR form is passed to the Dataset Packaging System (DPS)
The DPS registers metadata about the request into the RDBMS
The DPS locates the physical files that are needed to fulfill the order
The DPS assembles, organizes, and compresses the component files into a downloadable "package"
The DPS creates a unique "PackageID" that will be linked with this package throughout its lifecycle
The DPS registers metadata about the package (including the "PackageID") into the RDBMS
The DPS notifies the requesting user of the package's availability
The user logs back into the TSDW website (if necessary)
The user initiates a session of the Dataset Transfer System (DTS) to download the files
The DTS registers metadata about the package "receipt" into the RDBMS
The DIS notifies the appropriate TSDW administrator(s) of the download
Dependencies:
·
·
·
·
·
·
Dataset Request Form
Dataset Request confirmation message/page
Dataset Packaging System (DPS) (could be one-and-the-same with iRODS or Globus)
Appropriate RDBMS schema and SQL scripts/commands for managing Dataset Request metadata
Appropriate RDBMS schema for associating Dataset Requests with Users and Projects
A high volume data transfer program such as iRODS or Globus Connect Server
Use Case Description
"Download" Model Input Data from TSDW, Offline Method (Scope: External)
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
User logs into the TSDW website
User fills out the Dataset Request form
The DR form is passed to the Dataset Packaging System (DPS)
The DPS registers metadata about the request into the RDBMS
The DPS locates the physical files that are needed to fulfill the order
The DPS creates a unique "PackageID" that will be linked with this package throughout its lifecycle
The DPS registers metadata about the package (including the "PackageID") into the RDBMS
The DPS notifies the requesting user of the order receipt and future hard drive shipment
The DPS sends a list of the files that comprise the order to a TSDW administrator
A TSDW administrator copies the selected files onto a hard disk drive (HDD) or drives
A TSDW administrator mails the drive(s) to the requesting user
A TSDW administrator records the shipment in the RDBMS
Dependencies:
·
·
·
·
·
·
Dataset Request Form
Dataset Request confirmation message/page
Dataset Packaging System (DPS)
Appropriate RDBMS schema and SQL scripts/commands for managing Dataset Request metadata
Appropriate RDBMS schema for associating Dataset Requests with Users and Projects
A manual process for copying data files onto hard disks and mailing them to users
Use Case Description
Download Boundary Conditions Generator (Scope: External)
1.
2.
3.
4.
5.
6.
7.
8.
9.
User logs into the TSDW website
User navigates to the Modeling Utilities section of the website
User fills out the Boundary Conditions Generator (BCG) download form
The BCG download form is passed to the Utility Tracking System (UTS)
The UTS extracts information from the metadata file associated with the current BCG
The UTS associates this metadata with the appropriate User record in the RDBMS
The UTS redirects the user to a download link for the BCG
The user downloads the BCG and any associated instructions and configuration files
The DIS notifies the appropriate TSDW administrator(s) of the download
Dependencies:
·
·
·
·
·
Boundary Conditions Generator (BCG) program
BCG user guide
BCG download form
BCG download confirmation message/page and installation file link
The appropriate RDBMS schema, SQL scripts, and software libraries for managing BCG
download metadata
Use Case Description
Download the CAMx Post-Processing Utility (Scope: External)
1.
2.
3.
4.
5.
6.
7.
8.
9.
User logs into the TSDW website
User navigates to the Modeling Utilities section of the website
User fills out the CAMx Post-Processing Utility (CPPU) download form
The CPPU download form is passed to the Utility Tracking System (UTS)
The UTS extracts information from the metadata file associated with the current CPPU
The UTS associates this metadata with the appropriate User record in the RDBMS
The UTS redirects the user to a download link for the CPPU
The user downloads the CPPU and any associated instructions and configuration files
The DIS notifies the appropriate TSDW administrator(s) of the download
Dependencies:
·
·
·
·
·
CAMx Post-Processing Utility (CPPU) program
CPPU user guide
CPPU download form
CPPU download confirmation message/page and installation file link
The appropriate RDBMS schema, SQL scripts, and software libraries for managing
CPPU download metadata
Use Case Description
Upload Model Results (Scope: External)
1.
2.
3.
User logs into the TSDW website
User navigates to the Modeling Results Upload section of the website
User fills out the Modeling Results Upload form
a. User provides a standard description of the model results
b. User provides the "Package ID" of the model input data used
c. User provides the Background Conditions Generator "Version ID", if relevant
d. User provides the CAMx Post-Processing Utility "Version ID", if relevant
e. User selects the files to upload
f. User clicks the "Submit" button on the form
4. The Model Results Upload form is passed to the Data Import System (DIS)
5. The data files are uploaded and cataloged by the DIS
6. The DIS creates a unique "DatasetID" that will be linked to this upload throughout its lifecycle
7. The DIS registers metadata about the upload (including the "DatasetID") into the RDBMS
8. The DIS notifies the uploading user of the upload success or failure (generally, its "status")
9. The DIS places the file(s) into the appropriate location(s) on the TSDW file system
10. The DIS notifies the appropriate TSDW administrator(s) of the upload
Dependencies:
·
·
·
Modeling Results Upload (MRU) form
MRU system
Appropriate RDBMS schema and SQL scripts/commands for managing MRU metadata
Use Case Description
Import Database-Ready Model Results (Scope: Internal)
1. An administrator locates the newly-imported model results (which have been generated by the
CPPU and uploaded to the TSDW)
2. And administrator executes the appropriate scripts/commands using the Data Import System
(DIS)
3. The DIS reads and imports the database-ready model results into the RDBMS
a. The DIS verifies that all the necessary metadata is present in the RDBMS
b. The DIS transforms the data into the appropriate schema for import
c. The DIS maps source codes and names to internal codes and names, as needed
d. The DIS imports the data from the source file(s) into the RDBMS
e. The DIS makes/updates the appropriate metadata records in the RDBMS for tracking the
imported model Dataset
f. The imported model results become automatically available via the relevant tools on the
TSDW website
Dependencies:
·
·
·
The CAMx Post-Processing Utility (CPPU) for generating the database-ready model results
The Dataset Import System (DIS)
Appropriate RDBMS schema and SQL scripts/commands for managing Model Results metadata
Use Case Description
Visualize and Analyze Monitoring Data (Scope: External)
1. User logs into the TSDW website
2. The user chooses an appropriate visualization and/or analysis tool to use
3. Using the tool, the user specifies spatial, temporal, and other dimensional filters for
the data as well as display and formatting options
4. The tool displays monitoring data in various output products, such as:
a. Data summary tables
b. Bar charts
c. Line charts
d. Pie charts
e. Contour maps
Dependencies:
·
·
·
An appropriate collection of monitoring data
Specific design specifications for monitoring data output products
An appropriate collection of online visualization tools and technologies
Use Case Description
Visualize and Analyze Model Results (Scope: External)
1. The user logs into the TSDW website
2. The user chooses an appropriate visualization and analysis tool to use
3. Using the tool, the user specifies spatial, temporal, and other dimensional filters for the data as
well as display and formatting options
4. The tool displays model performance and evaluation results in various output products, such as:
a. Normalized mean error and bias
b. Mean normalized error and bias
c. Root mean square error
d. Correlation coefficients
e. Soccer plots
f. Box and whisker plots
g. Bugle plots
h. Spatial statistical plots
i. Spatial concentration plots with observation overlays
Dependencies:
·
·
·
An appropriate collection of model results data
Specific design specifications for model results output products
An appropriate collection of online visualization tools and technologies
Use Case Description
View Project Data and Metadata (Scope: External)
1.
2.
3.
4.
A user logs into the TSDW website
The user navigates to the Projects and Studies section of the TSDW website
The user views metadata associated with the projects that he/she has permission to view
a. Name, purpose, description
b. Contact information: project manager(s), contractors, etc.
c. Associated datasets: Model input data downloaded, model results uploaded, etc.
d. Analysis products: Charts, graphs, summaries, etc.
The user views data associated with the projects that he/she has permission to view
a. Model input data
i. Meteorological inputs
ii. Emissions inputs
iii. Initial and Boundary Conditions
iv. Ancillary inputs (land use, land cover, photolysis)
b. Model configuration metadata
c. Model results
i. Gridded results
ii. Observation-paired results
d. Monitoring data
Dependencies:
·
·
Appropriate RDBMS schema and SQL scripts/commands for managing Project metadata
o Projects
o Users
o Downloaded/Uploaded Datasets
o Documents
o Analysis products
An online user interface for the Projects and Studies section of the TSDW website
Use Case Summary
• Obtain and Manage Model Input Data (Scope: Internal)
• Harvest File Metadata Using the XML Metadata Generator (Scope: Internal)
• Download Model Input Data from TSDW, Online Method (Scope: External)
• "Download" Model Input Data from TSDW, Offline Method (Scope: External)
• Download Boundary Conditions Generator (Scope: External)
• Download the CAMx Post-Processing Utility (Scope: External)
• Upload Model Results (Scope: External)
• Import Database-Ready Model Results (Scope: Internal)
• Visualize and Analyze Monitoring Data (Scope: External)
• Visualize and Analyze Model Results (Scope: External)
• View Project Data and Metadata (Scope: External)
Thanks.
Review of the 3SDW Overall System Ecosystem and Architecture
Guidance, Requirements, Feedback, Funding
NPS
WGA
CIRA
Architecture, Design, Implementation, Management, and Operation
Monitored
Aerosol
Deposition
Raw Data
Gaseous
Modelers
AQS, VIEWS
Tools
Modeled
Emissions
Planners
Met
Air Quality
Documents
3SDW
WestJump,
future
modeling, etc
Results
Managers
Acquisition
Integration
Management
Distribution
Presentation
Identification, Acquisition, Pre- and Postprocessing, Extraction
Verification, Validation, QA/QC,
Mapping, Flagging, Tranformation
Storage, Backup, Restore, Security,
Summarizing, Statistics
Searching, Querying, Filtering,
Aggregating, Formatting, Packaging
Charting, Graphing,
Mapping, Analyzing
User Login Form
User Registration/Modification Form
User Profile/Account Form
User Feedback Form
Dataset Request Form
Raw Data Download (Query Wizard)
Time Series Charts (Query Wizard)
Dynamic Contour Maps (Query Wizard)
Site Metadata Report (Query Wizard)
Monitoring Site Browser
Modeled Emissions Summary Tool
Modeled-to-Obs Comparison Tool
Air Quality Summary Reports
Model Data Mapping Tool
Future Online Visualization Tools
First TSDW Modeling Use Case
Report and Results
First Use Case - Beta Test Steps
• Testers visited the TSDW website and registered with the system to create
an account
• Testers visited the Data Request web page and entered their requests for
the WestJump Base08b dataset
• Each request was stored in the database
• The system determined whether or not each request could be
automatically filled or had to be manually assembled
• The system sent emails to the appropriate TSDW team members to notify
them of the data requests
• TSDW team members assembled the dataset requests (copied the relevant
data files onto hard drives)
• The datasets (hard drives) were delivered to the beta testers
• The system updated the dataset requests to reflect their “filled” status
First Use Case - Beta Test Steps (cont’d)
• Using the delivered datasets, testers ran the models and
generated results
• Testers returned the model output results to the 3SDW
• The test results were assessed by TSDW team members
• Testing outcomes were summarized for the May 3-State AQ
Study Technical Workshop
• The TSDW team refines the dataset ordering, download,
packaging, and delivery system according to lessons learned
• The TSDW team develops the next Use Case testing scenario(s)
Summer (June – October) 2013 3SAQS
Technical Work Review
Data Warehouse Activities
Summary of Coming Activities
• Implement the collaborative components of the
warehouse
• Implement the ongoing news and updates section
• 3SDW on-line for NEPA air quality analysis projects by end
of October
• Out-bound data delivery and in-bound data ingestion for
NEPA and other air quality studies
• Data warehouse operations (hosting, data analysis and
processing, maintenance, et cetera)
• Plans for storage/access/visualization for modeling results
and evaluation tools
• Store UT BLM ARMS and other studies’ data in 3SDW after
evaluation using protocols
Testing and Refinement Help
• All users, collaborators, and partners can help with testing
• Please report bugs – don’t endure them
• Use the website Feedback form
• Send direct email to team members
• Provide as much information as possible up-front
• Stay abreast of ongoing additions and updates
• Be an active part of the design process - make suggestions
for features and refinements
• Don’t assume it can’t be done
• Don’t assume it can be done
Related documents
Download