DvoyNetworkIdeas

advertisement
Dvoy Networking Ideas
OpenGIS Web Services
•
•
•
•
Mission: Definition and specification of geospatial web services.
A Web service is an application that can be published, located, and dynamically
invoked across the Web.
Applications and other Web services can discover and invoke the service.
The sponsors of the Web services initiative include
–
–
–
–
–
–
–
–
•
Federal Geographic Data Committee
Natural Resources Canada
Lockheed Martin
National Aeronautics and Space Administration
U.S. Army Corps of Engineers Engineer Research and Development Center
U.S. Environmental Protection Agency EMPACT Program
U.S. Geological Survey
US National Imagery and Mapping Agency.
Phase I - February 2002
–
–
–
Common Architecture: OGC Services Model, OGC Registry Services, and Sensor
Model Language.
Web Mapping: Map Server- raster, Feature Server-vector, Coverage Server-image,
Coverage Portrayal Services.
Sensor Web: OpenGIS Sensor Collection Service for accessing data from a variety of
land, water, air and other sensors.
D-DADS Architecture
The D-DADS Components
• Data Providers supply primary data to system, through SQL or other data
servers.
• Standardized Description & Format populate and describe the data
cubes and other data types using a standard metadata describing data
• Data Access and Manipulation tools for providing a unified interface to
data cubes, GIS data layers, etc. for accessing and processing (filtering,
aggregating, fusing) data and integrating data into virtual data cubes
• Users are the analysts who access the D-DADS and produce knowledge from
the data
The multidimensional data access and manipulation
component of D-DADS will be implemented using OLAP.
Interoperability
One requirement for an effective distributed environmental
data system is interoperability, defined as,
“the ability to freely exchange all kinds of spatial
information about the Earth and about objects and
phenomena on, above, and below the Earth’s surface;
and to cooperatively, over networks, run software
capable of manipulating such information.” (Buehler &
McKee, 1996)
Such a system has two key elements:
• Exchange of meaningful information
• Cooperative and distributed data management
On-line Analytical Processing: OLAP
•
A multidimensional data model making it easy to select, navigate,
integrate and explore the data.
• An
analytical query language providing power to filter, aggregate
and merge data as well as explore complex data relationships.
• Ability
to create calculated variables from expressions based on
other variables in the database.
•
Pre-calculation of frequently queried aggregated values, i.e.
monthly averages, enables fast response time to ad hoc queries.
User Interaction with D-DADS
Query
XML data
Distributed
Database
Data View
(Table, Map, etc.)
XML data
Metadata Standardization
Metadata standards for describing air quality data are
currently being actively pursued by several
organizations, including:
• The Supersite Data Management Workgroup
• NARSTO
• FGDC
Potential D-DADS Nodes
The following organizations are potential nodes in a
distributed data analysis and dissemination system:
• CAPITA
• NPS-CIRA
• EPA Supersites
- California
- Texas
- St. Louis
Summary
In the past, data analysis has been hampered by data flow
resistances. However, the tools and framework to
overcome each of these resistances now exist, including:
• World Wide Web
• XML
• OLAP
• OpenGIS
• Metadata standards
Incorporating these tools will initiate a distributed data
analysis and dissemination system.
‘Global’ and ‘Local’ AQ Analysis
•
•
•
•
•
AQ data analysis needs to be performed at both global and local levels
The ‘global’ refers to regional national, and global analysis. It establishes the
larger-scale context.
‘Local’ analysis focuses on the specific and detailed local features
Both global and local analyses are needed for for full understanding.
Global-local interaction (information flow) needs to be established for
effective management.
National and Local AQ Analysis
Data Re-Use and Synergy
•
•
•
Data producers maintain their own workspace and resources (data, reports, comments).
Part of the resources are shared by creating a common virtual resources.
Web-based integration of the resources can be across several dimensions:
Spatial scale:
Data content:
Local – global data sharing
Combination of data generated internally and externally
Local
Local
User
Shared part of resources
User
Content
Virtual Shared Resources
User
Data, Knowledge
Tools, Methods
Content
User
Global
•
•
Global
User
The main benefits of sharing are data re-use, data complementing and synergy.
The goal of the system is to have the benefits of sharing outweigh the costs.
Integration for Global-Local Activities
Global and local activities are both needed – e.g. ‘think global, act local’
‘Global’ and ‘Local’ here refers to relative, not absolute spatial scale
Global Activity
Local Benefit
Global data, tools
Improved local productivity
Global data analysis
Spatial context; initial analysis
Analysis guidance
Standardized analysis, reporting
Local Activity
Global Benefit
Local data, tools
Improved global productivity
Local data analysis
Elucidate, expand initial analysis
Identify relevant issues
Responsive, relevant global analysis
Content Integration for Multiple Uses (Reports)
Data from multiple measurements are shared by their providers or custodians
Data are integrated, filtered, aggregated and fused in the process of analysis
Reports use the analysis for Status and Trends; Exposure Assessment; Compliance …
The creation of the needed reports requires data sharing and integration from multiple sources.
Federated Data Warehouse Features
•
As much as possible, data should reside in their respective home environment.
‘Uprooted’ data in decoupled databases tend to decay i.e. can not be easily updated,
maintained, enriched.
•
Data Providers would need to ‘open up’ their SQL data servers for limited data subsets
and queries, in accordance with a ‘contract’. However, the data structures of the
Providers will not need to be changed.
•
Data from the providers will be transferred to the ‘federated data warehouse’ through
(1) on-line DataAdapters, (2) Manual web submission and (3) Semi-automated
transfer from the NARSTO archive.
•
Retrieval of uniform data from the data warehouse facilitates integration and
comparison along the key dimensions (space, time, parameter, method)
•
The open architecture data warehouse (see Web Services) promotes the building of
further value chains: Data Viewers, Data Integration Programs, Automatic Report
Generators etc..
DVoy: Components and Data Flow
Presentation Services
Navigation
Service
Catalog Service
DataSet Recs
Time Chart
DataSet Records
• Provider Descript.
Selected Measure Record
Find
(Measure)
• Service Descript.
• Measure Access
FocusCube,
GlobCursor,
Layered Map
Data Services
Data Delivery
Publish
WebService
(DataSet)
Data Delivery
WebService
Measure, Granule
Data Wrapping
DataToView
DataForCursorAndView
• Dvoy
Bind (Measure, FocusCube)
WebService
Legacy Data
Viewer Layers
Cursor
Data provided by each dimension of a
View:
Dim1: Lon
Data Focus Range
Dim2: Lat
Dim1.Type, Dim1.Min, Dim1.Max
Dim2.Type, Dim2.Min, Dim2.Max
Rendering
….
Current Dim.Types:
Latitude, Longitude, DateTime, Elevation
Federated Data Services Architecture
XDim Data
SQL
Table
XML Web
Services
Data Warehouse Tier
OLAP
Cube
Data View & Process Tier
Layered Map
GIS Data
Satellite
OpenGIS
Services
Vector
Connection
Cursor-Query
Manager
Manager
Data Access
Data View
Manager
Manager
Cursor
Time Chart
Text, Table
Scatter Chart
Text Data
Web
Page
HTTP
Services
Text
Data
Distributed data of multiple
types (spatial, temporal text)
The Broker handles the views,
connections, data access, cursor
Data are rendered by linked
Data Views (map, time, text)
Dvoy Federated Information System
• Dvoy offers a homogeneous, read-only access
mechanism to a dynamically changing collection of
heterogeneous, autonomous and distributed information
sources.
• Data access uses a global multidimensional schema
consisting of spatial, temporal and parameter dimensions
• The uniform global schema is suitable for data browsing
and online analytical processing, OLAP
• The limited global query capabilities yield slices along the
spatial, temporal and parameter dimensions of the
multidimensional data cubes.
Architecture of DATAFED Federated Data System
After Busse et. al., 1999
•
The main software components of Dvoy are wrappers, which encapsulate sources
and remove technical heterogeneity, and mediators, which resolve the logical
heterogeneity.
•
Wrapper classes are available for geo-spatial (incl. satellite) images, SQL servers,
text files,etc. The mediator classes are implemented as web services for uniform
data access to n-dimensional data.
Integration Architecture (Ullman, 1997)
•
•
•
Heterogeneous sources are wrapped by software that translates between the sources local
language, model and concepts and the shared global concepts
Mediators obtain information from one or more components (wrappers or other mediators) and
pass it on to other mediators or to external users.
In a sense, a mediator is a view of the data found in one or more sources; it does not hold the data
but it acts as it it did. The job of the mediator is to go to the sources and provide an answer to the
query.
Federated PM and Haze Data Warehouse Project
a sub- project of
(enter your sticker & logo here )
St. Louis Midwest Supersite Project
RPO
Regional Planning Organization
SupSite
EPA Supersites
NARSTO
NARSTO PM
EPA
EPA Division1, Division2, Division2
Me
Me and my dog for our aerosol project
Nov 20, 2001, RBH
PM/Haze Data Flow in Support of AQ Management
FLM
RPO
FLM
RPO
FLM
RPO
Federal Land Managers
Regional Planning Orgs
EPA
EPA
EPA
EPA Regul. & Research
Shared
PM/Haze
Data
SuperSite
NARSTO
Industry
Academic
Other: Private, Academic
•
PM and haze data are used for may parts of AQ
management, mostly in form of Reports
•
There are numerous organizations in need of
data relevant to PM/Haze
•
The variety of pertinent (ambient, emission) data
come from many different sources
•
Most interested parties (stakeholders) are both
producers and consumers of PM and haze data
•
To produce relevant reports, the data need to be
‘processed’ (integrated, filtered aggregated)
•
There is a general willingness to share data but
the resistances to data flow and processing are
too high
Scientific and Administrative Rationale for Resource Sharing
•
•
•
•
Scientific Rationale:
Regional haze and its precursors have a 1000-10000 km airshed.
(Smoke, Dust, Haze) – Data integration
Substantial fraction of haze originates from natural sources or
from out-of-jurisdiction man-made sources
Cross-RPO data and knowledge sharing yields better operational
and science support to AQ management
Management Rationale:
• Haze control within some RPOs cannot yield
• Data sharing saves money and ….
A Strategy for the Federated PM/Haze Data Warehouse
• Negotiate with the data providers ‘open up’ their data servers for
limited, controlled, access in accordance with clear ‘access
contract’ with the Federated Warehouse
• Design an interface to the warehoused datasets that has simple data
access and satisfies the data needs of most integrating
users.(oxymoron ????)
• Facilitate the the development of shared value-adding processes
(analysis tools, methods) that refine the raw data to useful
knowledge
Three-Tier Federated Data Warehouse Architecture
(Note: In this context, ‘Federated’ differs from ‘Federal’ in the direction of the driving force. Federated
meant to indicate a driving force for sharing from ‘bottom up’ i.e. from the members, not dictated
from ‘above’, by the Feds)
1. Provider Tier: Back-end servers containing heterogeneous data, maintained by the federation
members
2. Proxy Tier: Retrieves designated Provider data and homogenizes it into common, uniform Datasets
3. User Tier: Accesses the Proxy Server and uses the uniform data for presentation, integration or
Federated Data Warehouse
processing
User Tier
Data presentation,
processing
Proxy Tier
Data homogenization,
transformation
Provider Tier
Heterogeneous data in
distributed SQL Servers
Federated Data Warehouse Interactions
•
The Provider servers interact only with the Proxy Server in accordance with the
Federation Contract
– The contract sets the rules of interaction (accessible data subsets, types of queries)
– Strong server security measures enforced, e.g. through Secure Socket layer
•
The data User interacts only with the generic Proxy Server using flexible Web Services
interface
– Generic data queries, applicable to all data in the Warehouse (e.g. data sub-cube by space, time,
parameter)
Federated Data Warehouse
– The data query is addressed to the Web Service provided by the Proxy Server
– Uniform, self-describing data packages are passed
the user for presentation
or further
ProxytoTier
Provider
Tier
User Tier
processing
Heterogeneous Data
Data Consumption
Data Homogenization, etc.
Presentation
SQLDataAdapter1
SQLServer1
Processing
SQLDataAdapter2
SQLServer2
Integration
CustomDataAdapter
LegacyServer
Data Access & Use
Proxy Server
Member Servers
Web Service, Uniform Query & Data
Fire Wall, Federation Contract
‘Global’ and ‘Local’ AQ Analysis
•
•
•
•
•
AQ data analysis needs to be performed at both global and local levels
The ‘global’ refers to regional national, and global analysis. It establishes the
larger-scale context.
‘Local’ analysis focuses on the specific and detailed local features
Both global and local analyses are needed for for full understanding.
Global-local interaction (information flow) needs to be established for
effective management.
National and Local AQ Analysis
Data Model
Ray Plante, Virtual Obs
•
What’s the difference between Data Models and Metadata? Intertwined
– metadatum: a datum with a name or semantic tag that refers to the data
– data model: a description of the relationships between metadata
• structural & logical relationships between compound objects & their components
• operations that can be performed on them (really -
– framework: the architecture/process used to define metadata/data models that
enables their ready use in applications
•
Formalized data modeling process
– encourages community involvement for defining standard models & metadata
– structure enables easy verification, dissemination, & automated use
– “standard” metadata should point directly to components of the “standard”
models
– allow groups to define metadata independent of a “standard” metadata
•
Practical Difference?
– data model captures as complete a picture of a concept as possible
– metadata represents the instantiation of portion of the model’s components
•
Data access through a data model (Wrapper Classes for each data model)
Integration for Global-Local Activities
Global and local activities are both needed – e.g. ‘think global, act local’
‘Global’ and ‘Local’ here refers to relative, not absolute scale
Global Activity
Local Benefit
Global data, tools
=> Improved local productivity
Global data analysis
=> Spatial context; initial analysis
Analysis guidance
=> Standardized analysis, reporting
Local Activity
Global Benefit
Local data, tools
=> Improved global productivity
Local data analysis
=> Elucidate, expand initial analysis
Identify relevant issues
=> Responsive, relevant global work
Federated Data System Features
• Data reside in their respective home environment where it can mature.
‘Uprooted’ data in centralized databases are not easily updated, maintained,
enriched.
• Abstract (universal) query/retrieval facilitates integration and comparison
along the key dimensions (space, time, parameter, method)
• The open data query based on Web Services promotes the building of further
value chains: Data Viewers, Data Integration Programs, Automatic Report
Generators etc..
• The data access through the Proxy server protects the data providers and the
data users from security breaches, excessive detail
Integration for Global-Local Activities
Global and local activities are both needed – e.g. ‘think global, act local’
‘Global’ and ‘Local’ here refers to relative, not absolute spatial scale
Global Activity
Local Benefit
Global data & analysis
Spatial context; initial analysis
Analysis guidance
Standardized analysis, reporting
Local Activity
Global Benefit
Local data & analysis
Elucidate, expand initial analysis
Identify relevant issues
Responsive, relevant global analysis
Data Re-Use and Synergy
•
•
•
Data producers maintain their own workspace and resources (data, reports, comments).
Part of the resources are shared by creating a common virtual resources.
Web-based integration of the resources can be across several dimensions:
Spatial scale:
Data content:
Local – global data sharing
Combination of data generated internally and externally
Local
Local
User
Shared part of resources
User
Content
Virtual Shared Resources
User
Data, Knowledge
Tools, Methods
Content
User
Global
•
•
Global
User
The main benefits of sharing are data re-use, data complementing and synergy.
The goal of the system is to have the benefits of sharing outweigh the costs.
Federated Information System
•
•
•
Providers maintain their own workspace and resources (data, tools, reports)
Part of the private resources are exposed as shared (federated) resources
The Federation facilitates finding, accessing and usage of the shared resources
Data
Providers/Users
Other
Federations
Data sharing federations:
•
•
•
•
Open GIS Consortium (GIS data layers)
NASA SEEDS network (Satellite data)
NSF Digital Government
EPA’s National Env. Info Exch. Network.
Info
Shared
Private
Shared (Federated)
Resources
Data, Services, Tools, Methods
Data Federation Concept and the FASNET Network
Schematic representation of data sharing in
a federated information system.
Based on the premise that providers expose
part of their data (green) to others
Schematics of the value-adding network
proposed for FASTNET
Components embedded in the federated
value network
Data Acquisition and Usage Value Chain
Monitor
Data 1
Store
IntData1
Monitor
Data 2
Store
IntData2
Monitor
Data n
Store
Monitor
Data m
Store
IntDatan
Virtual Int.
Data
Processes of the Information Value Chain
(after Taylor, 1975)
Organizing
Analyzing
Judging
Deciding
Grouping
Classifying
Formatting
Displaying
Separating
Evaluating
Interpreting
Synthesizing
Options
Quality
Advantages
Disadvantages
Matching goals,
Compromising
Bargaining
Deciding
Data
Examples:
•
Information
CIRA VIEWS
Informing
Knowledge
Langley IDEA
Productive
Knowledge
WG Summary Rpt
Action
AQ Manager
Data Flow and Processing
Not There!
Further
Analysis
When?
Where?
CATT: A Community Tool!
GIS
Part of an Analysis Value Chain
Grid Processing
Emission
Why?
There!
Comparison
How?
AEROSOL
Collection
IMP. EPA
Aerosol
Sensors
Integration
VIEWS
Aerosol
Data
CATT-In
CAPITA
Integrated
AerData
AerData
Cube
Aggreg.
Aerosol
Next
Process
CATT
Weather
Data
Gridded
Meteor.
Assimilate
NWS
TRANSPORT
Traject.
Data
Trajectory
ARL
TrajData
Cube
CATT-In
CAPITA
Aggreg.
Traject.
Next
Process
Fast Aerosol Sensing Tools for Natural Event Tracking FASTNET
Analysts Console
Community Website
Distributed Programming: Interpreted and Compiled
• Web services allow processing of distributed data
– Data are distributed and maintained by their custodians,
– Processing nodes (web-services) are also distributed
– ‘Interpreted’ web-programs for data processing can be created ad hoc by end users
• However, ‘interpreted’ web programs are slow, fragile and uncertain
– Slow due to large data transfers between nodes
– Fragile due to instability of connections
– Uncertain due to failures of data provider and processing nodes
• One solution is to ‘compile’ the data and processing services
– Data compilation transforms the data for fast, effective access (e.g. OLAP)
– Web service compilation combines processes for effective execution
• Interpreted or compiled?
– Interpreted web programs are simpler and up to date but slow, fragile, uncertain
– Compiled versions are more elaborate and latent but also faster and more robust
– Frequently used datasets and processing chains should be compiled and kept current
Interpreted and Compiled Service
Interpreted Service
• Processes distributed
Point
Access
• Data flow on Internet
Point
Grid
Grid
Render
Point
Access
Point
Render
PtGrid
Overlay
Data Flow
Control Flow
Compiled Service
• Processes in the same place
Point
Access
Point
Grid
Grid
Render
• Data flow within aggregate service
• Controllers, e.g. zoom can be shared
Point
Render
PtGrid
Overla
y
Voyager: The Program
Controls
Ports
Voyager Core
Data Selection
Data Access
Data Portrayal
Device Drivers
Displays
Wrapper
s
Data Sources
Adoptive Abstract I/O Layer
• The Voyager program consists of a stable core and adoptive input/output
section
• The core executes the data selection, access portrayal tasks
• The adoptive, abstract I/O layer connects the core to evolving web data,
flexible displays and to the a configurable user interface:
– Wrappers encapsulate the heterogeneous external data sources and homogenize
the access
Dvoy_Services: Generic Software components
User Interface Module
UIM extracts relevant UI parameters from STATE
User Interface
Module
User changes UI parameters
UIM transmits modified UI parameters to STATE
Controller
state I/o ports
Service Chain STATE Module
Contains the state params for all services in the chain
Service state
Has ports for getting/setting state params
state I/o ports
Web service
Input data
Webservice
Service Adopter Module
Gets input data from upsteam service
Adaptor
Gets service params from STATE
Web service calls
Make service call
Web service Service Module
Webservice
Param 1
Param2
Gets service call from Adopter module
Web service
Output data
Executes service
Returns output data
Service Adopter Module
PointAccess->Grid->GridRender Service Chain
GetMapPointData
Selector
RenderMapviewPoint
Selector
state I/o ports
Service state
Service state
state I/o ports
GetMapPointData
Adaptor
RenderMapviewPoint
Adaptor
Web service calls
GetMapPointData
dataset_abbr: IMPROVE
Param_abber SOILf
datatime: 2001-04-16
sql_filter:
•
RenderMapviewPoint
Web service
Output data
dataset_url:
output_format:
out_image_width:
Etc…..
The service chain interpreter make ONLY 2 sequential calls, stated in the data flow
program:
–
–
GetMapPointDataAdaptor
RenderMapviewPoint Adaptor
PointAccess->Grid->GridRender Service Chain
GridMapviewPoint
Selector
GetMapPointData
Selector
RenderMapviewGrid
Selector
state I/o ports
Service state
Service state
Service state
GridMapviewPoint
Adaptor
RenderMapviewGrid
Adaptor
GridMapviewPoint
RenderMapviewGrid
state I/o ports
GetMapPointData
Adaptor
Web service calls
GetMapPointData
dataset_abbr: IMPROVE
•
Web service
Output data
dataset_url:
dataset_url:
Param_abber SOILf
output_format:
output_format:
datatime: 2001-04-16
out_image_width:
out_image_width:
sql_filter:
Etc…..
Etc…..
The service chain interpreter make ONLY 3 sequential calls, stated in the data flow
program:
– GetMapPointDataAdaptor
– GridMapviewPointAdaptor
– RenderMapviewGridAdaptor
VOYAGER Web Services
C O M M U N I T Y
XDim
Data
SQL
Tables
Layered
Map
Publish, Find, Bind
GIS Data
Vector
Time Chart
Catalog, Data & Tools
Uniform Access
Web
Scatter
Chart
Images
Providers
Voyager Web Services
Users
Maintain distributed data;
Heterogeneous coding , access
Homogenize data access
Catalog, access, transform data
Select, Overlay, Explore;
Multidimensional data
Coordination
S u
p
p
o
r
t
Technologies
Services Program Execution:
Reverse Polish Notation
Writing the WS program:
- Write the program on the command line of a URL call
- Services are written sequentially using RPN
- Replacements
Connector/Adaptor:
- Reads the service name from the command line and loads its WSDL
- Scans the input WSDL
- The schema walker populates the service input fields from:
- the data on the command line
- the data output of the upstream process
- the catalog for the missing data
Service Execution
For each service
Reads the command line, one service at a time
Passes the service parameters to the above Connector/Adopter, which prepares the
service
Executes the service
It also handles the data stack for RPN
Download