Metadata and Provenance for Spatial Analysis the case

Metadata, Provenance and Web
Service for Spatial Analysis
--the case of spatial weights
Luc Anselin, Sergio Rey, Wenwen Li
GeoDa Center
School of Geographical Sciences and Urban Planning
Arizona State University
Copyright © 2013 by Luc Anselin, All Rights Reserved
• Some Specific Project Goals
•
Integrate and sustain a core set of
composable, interoperable,
manageable, and reusable CyberGIS
software elements based on
community-driven and open source
strategies
Copyright © 2013 by Luc Anselin, All Rights Reserved
• Challenge
• most current spatial analysis/spatial
econometrics software written for
single CPU
• rethink and rewrite analytical,
algorithmic and processing facilities to
integrate into a cyberinfrastructure
• address lack of interoperability
Copyright © 2012 by Luc Anselin, All Rights Reserved
• Spatial Econometrics Workbench
• framework for supporting spatial
econometric research in a
cyberscience era (Anselin and Rey,
IJGIS 2012)
• Leverage PySAL and CyberGIS
• Support scientific workflow
Copyright © 2013 by Luc Anselin, All Rights Reserved
• PySAL
• open source library of Python routines
for spatial analysis:
geocomputation, spatial weights,
spatial autocorrelation, spatial
econometrics, regionalization
• http://pysal.org
• hosted on github
Copyright © 2013 by Luc Anselin, All Rights Reserved
Copyright © 2013 by Luc Anselin, All Rights Reserved
• PySAL Progress Report
• current version is 1.6 (7th release)
• 3.5 years of on-time bi-annual
releases
• 20,000+ downloads (10,000 in 2012)
• recognized in open source scientific
community - Anaconda
Copyright © 2013 by Luc Anselin, All Rights Reserved
Anaconda for big data analytics
Copyright © 2013 by Luc Anselin, All Rights Reserved
• Migrating to CyberGIS
• performance = need for parallelization
+ refined algorithms
• interoperability = provide functionality
as web services
• replicability: need for metadata and
provenance tracking
Copyright © 2013 by Luc Anselin, All Rights Reserved
• Example: Spatial Weights
• includes spatial data source, type of
weights (e.g., contiguity, distance),
any standardization or manipulation
(e.g., higher order)
Copyright © 2013 by Luc Anselin, All Rights Reserved
• Lack of Interoperability
• different implementations
• no standards
• duplication of efforts
• hinders interoperability and workflow
chaining
Copyright © 2013 by Luc Anselin, All Rights Reserved
• Example: Weights Formats in PySAL
Copyright © 2013 by Luc Anselin, All Rights Reserved
Example: PySAL spgreg
what do we know about south_k6.gwt and south_ep_k20.kwt
Copyright © 2013 by Luc Anselin, All Rights Reserved
• Conceptual Framework
• separate data source from operations
• data source: polygon or coordinate
files with standard metadata
(projection, origin, etc.)
• operations: weights metadata
Copyright © 2012 by Luc Anselin, All Rights Reserved
weights vocabulary
Copyright © 2013 by Luc Anselin, All Rights Reserved
weights metadata structure (wmd)
Copyright © 2013 by Luc Anselin, All Rights Reserved
• Web service implementation(OGC
WPS)
• wraps PySAL weights module
• (re)creates weights object from information
in wmd file
• makes weights object available as a file
Copyright © 2013 by Luc Anselin, All Rights Reserved
wmd file
(json)
Weights Parser
PySAL
Dispatcher
Weights
Output
Metadata
Workflow
Copyright © 2013 by Luc Anselin, All Rights Reserved
Illustration
Copyright © 2013 by Luc Anselin, All Rights Reserved
• Generate Weights from Shapefile
• NAT.shp available on server
• output format = gal
Copyright © 2013 by Luc Anselin, All Rights Reserved
• Get Request
• http://spatial.gdta.asu.edu/cgibin/wps.cgi?request=Execute&service=WPS&version=1.0.0
&identifier=weights_ws&status=false&datainputs=[outputf
ormat=gal;metadata={"input1":{"type":"shp","uri":"http://t
oae.org/pub/NAT.shp"},"weight_type":"rook","transform":
"O", "parameters":{"p":2,"k":4}}]
Copyright © 2013 by Luc Anselin, All Rights Reserved
metadata input
Server Response
Copyright © 2013 by Luc Anselin, All Rights Reserved
Sample gal output
http://spatial.gdta.asu.edu/wpsoutput/e66df12
8-14ed-11e3-bde9-0050455c0671.gal
Copyright © 2013 by Luc Anselin, All Rights Reserved
metadata (wmd) file
http://spatial.gdta.asu.edu/wpsoutput/e66df12
8-14ed-11e3-bde9-0050455c0671.wmd
Copyright © 2013 by Luc Anselin, All Rights Reserved
Performance Evaluation
• How does PySAL scale when the amount of
input data increases?
• Is the overhead of web service framework
acceptable?
• How does the web service framework scale
in handling massive concurrent requests?
Copyright © 2013 by Luc Anselin, All Rights Reserved
Scale-up vs. Scale-out solution
• Scale-up
• High-end computer
• Configuration
•
•
•
Processor 2 x 2.93 GHz Quad-Core Intel Xeon
Memory 16 GB 1066 MHz DDR3 ECC
Software Mac OS X Lion 10.7.4 (11E53)
• Scale-out:
• Web server cluster
Copyright © 2013 by Luc Anselin, All Rights Reserved
Web Server Cluster
Copyright © 2013 by Luc Anselin, All Rights Reserved
• Performance
• experiment using grid layout for N =
10,000 to N = 100,000
• rook contiguity and k nearest
neighbors (k = 4)
• input shape files on server in Utah,
web service on server at ASU
Copyright © 2013 by Luc Anselin, All Rights Reserved
• Experiment 1
• Timing: average over 5 experiments
• web server overhead, data transfer
and computation
• explore effect of data size
Copyright © 2013 by Luc Anselin, All Rights Reserved
time for rook and KNN contiguity
Copyright © 2013 by Luc Anselin, All Rights Reserved
• Experiment 2
• Scalability of web service framework
• High-end computer (8-cores)
• Cluster (4 computing nodes, each has 2core)
• Total processing time
• Speed up
Copyright © 2013 by Luc Anselin, All Rights Reserved
Total processing time
Copyright © 2013 by Luc Anselin, All Rights Reserved
Speed-up
Copyright © 2013 by Luc Anselin, All Rights Reserved
• Experiment 3
• Scalability of the cluster by adding more
computing nodes
• Average response time
• 128 concurrent requests
• Dataset: 10,000 polygons
Copyright © 2013 by Luc Anselin, All Rights Reserved
Scalability - cluster
Copyright © 2013 by Luc Anselin, All Rights Reserved
Next Steps
Copyright © 2013 by Luc Anselin, All Rights Reserved
Copyright © 2013 by Luc Anselin, All Rights Reserved
• Towards a Standard
• refine specification: flexible,
expandable, deal with edge cases
• improve performance (parallelization)
• implement seek operations on
distributed files
• interoperability with other software
Copyright © 2013 by Luc Anselin, All Rights Reserved
Thank you!