FederatedDataWarehouse2

advertisement
Federated PM and Haze Data Warehouse Project
a sub- project of
(enter your sticker & logo here )
St. Louis Midwest Supersite Project
RPO
Regional Planning Organization
SupSite
EPA Supersites
NARSTO
NARSTO PM
EPA
EPA Division1, Division2, Division2
Me
Me and my dog for our aerosol project
Nov 20, 2001, RBH
PM/Haze Data Flow in Support of AQ Management
FLM
RPO
FLM
RPO
FLM
RPO
Federal Land Managers
Regional Planning Orgs
EPA
EPA
EPA
EPA Regul. & Research
Shared
PM/Haze
Data
SuperSite
NARSTO
Industry
Academic
Other: Private, Academic
•
PM and haze data are used for may parts of AQ
management, mostly in form of Reports
•
There are numerous organizations in need of data
relevant to PM/Haze
•
The variety of pertinent (ambient, emission) data
come from many different sources
•
Most interested parties (stakeholders) are both
producers and consumers of PM and haze data
•
To produce relevant reports, the data need to be
‘processed’ (integrated, filtered aggregated)
•
There is a general willingness to share data but the
resistances to data flow and processing are too high
Scientific and Administrative Rationale for Resource Sharing
•
•
•
•
Scientific Rationale:
Regional haze and its precursors have a 1000-10000 km airshed.
(Smoke, Dust, Haze) – Data integration
Substantial fraction of haze originates from natural sources or from out-ofjurisdiction man-made sources
Cross-RPO data and knowledge sharing yields better operational and science
support to AQ management
Management Rationale:
• Haze control within some RPOs cannot yield
• Data sharing saves money and ….
A Strategy for the Federated PM/Haze Data Warehouse
• Negotiate with the data providers ‘open up’ their data servers for limited,
controlled, access in accordance with clear ‘access contract’ with the Federated
Warehouse
• Design an interface to the warehoused datasets that has simple data access and
satisfies the data needs of most integrating users.(oxymoron ????)
• Facilitate the the development of shared value-adding processes (analysis tools,
methods) that refine the raw data to useful knowledge
Three-Tier Federated Data Warehouse Architecture
(Note: In this context, ‘Federated’ differs from ‘Federal’ in the direction of the driving force. Federated meant to indicate
a driving force for sharing from ‘bottom up’ i.e. from the members, not dictated from ‘above’, by the Feds)
1.
Provider Tier: Back-end servers containing heterogeneous data, maintained by the federation members
2.
Proxy Tier: Retrieves designated Provider data and homogenizes it into common, uniform Datasets
3.
User Tier: Accesses the Proxy Server and uses the uniform data for presentation, integration or processing
Federated Data Warehouse
User Tier
Data presentation,
processing
Proxy Tier
Data homogenization,
transformation
Provider Tier
Heterogeneous data in
distributed SQL Servers
Federated Data Warehouse Interactions
•
The Provider servers interact only with the Proxy Server in accordance with the Federation Contract
–
–
•
The contract sets the rules of interaction (accessible data subsets, types of queries)
Strong server security measures enforced, e.g. through Secure Socket layer
The data User interacts only with the generic Proxy Server using flexible Web Services interface
–
–
–
Generic data queries, applicable to all data in the Warehouse (e.g. data sub-cube by space, time, parameter)
The data query is addressed to the Web Service provided by the Proxy Server
Uniform, self-describing data packages are passed to the user for presentation or further processing
Federated Data Warehouse
Proxy Tier
Provider Tier
Data Homogenization, etc.
Heterogeneous Data
User Tier
Data Consumption
Presentation
SQLDataAdapter1
SQLServer1
Processing
SQLDataAdapter2
SQLServer2
Integration
CustomDataAdapter
LegacyServer
Data Access & Use
Proxy Server
Member Servers
Web Service, Uniform Query & Data
Fire Wall, Federation Contract
Live Demo of the Data Warehouse Prototype
http://capita.wustl.edu/DSViewer/DSviewer.aspx
Currently online data are accessible from the
CIRA (IMPROVE) and CAPITA SQL
servers
Uniform Data Query regardless of the native
schema: Query by parameter, location,
time, method
The hidden DataAdopter
- accepts the uniform query
- accesses the data server
- transforms the original to uniform data
- delivers uniforms DataSets
A rudimentary viewer displays the data in a
table for browsing.
‘Global’ and ‘Local’ AQ Analysis
•
•
•
•
•
AQ data analysis needs to be performed at both global and local levels
The ‘global’ refers to regional national, and global analysis. It establishes the largerscale context.
‘Local’ analysis focuses on the specific and detailed local features
Both global and local analyses are needed for for full understanding.
Global-local interaction (information flow) needs to be established for effective
management.
National and Local AQ Analysis
Integration for Global-Local Activities
Global and local activities are both needed – e.g. ‘think global, act local’
‘Global’ and ‘Local’ here refers to relative, not absolute scale
Global Activity
Local Benefit
Global data, tools
=> Improved local productivity
Global data analysis
=> Spatial context; initial analysis
Analysis guidance
=> Standardized analysis, reporting
Local Activity
Global Benefit
Local data, tools
=> Improved global productivity
Local data analysis
=> Elucidate, expand initial analysis
Identify relevant issues
=> Responsive, relevant global work
Data Re-Use and Synergy
•
•
•
Data producers maintain their own workspace and resources (data, reports, comments).
Part of the resources are shared by creating a common virtual resources.
Web-based integration of the resources can be across several dimensions:
Spatial scale:
Data content:
Local – global data sharing
Combination of data generated internally and externally
Local
Local
User
Shared part of resources
User
Content
Virtual Shared Resources
User
Data, Knowledge
Tools, Methods
Content
User
Global
•
•
Global
User
The main benefits of sharing are data re-use, data complementing and synergy.
The goal of the system is to have the benefits of sharing outweigh the costs.
Federated Data Warehouse Features
• Data reside in their respective home environment where it can mature.
‘Uprooted’ data in separated databases are not easily updated, maintained,
enriched.
• Abstract (universal) query/retrieval facilitates integration and comparison
along the key dimensions (space, time, parameter, method)
• The open data query based on Web Services promotes the building of further
value chains: Data Viewers, Data Integration Programs, Automatic Report
Generators etc..
• The data access through the Proxy server protects the data providers and the
data users from security breaches, excessive detail
Download