Metadata, Provenance and Web Service for Spatial Analysis --the case of spatial weights Luc Anselin, Sergio Rey, Wenwen Li GeoDa Center School of Geographical Sciences and Urban Planning Arizona State University Copyright © 2013 by Luc Anselin, All Rights Reserved • Some Specific Project Goals • Integrate and sustain a core set of composable, interoperable, manageable, and reusable CyberGIS software elements based on community-driven and open source strategies Copyright © 2013 by Luc Anselin, All Rights Reserved • Challenge • most current spatial analysis/spatial econometrics software written for single CPU • rethink and rewrite analytical, algorithmic and processing facilities to integrate into a cyberinfrastructure • address lack of interoperability Copyright © 2012 by Luc Anselin, All Rights Reserved • Spatial Econometrics Workbench • framework for supporting spatial econometric research in a cyberscience era (Anselin and Rey, IJGIS 2012) • Leverage PySAL and CyberGIS • Support scientific workflow Copyright © 2013 by Luc Anselin, All Rights Reserved • PySAL • open source library of Python routines for spatial analysis: geocomputation, spatial weights, spatial autocorrelation, spatial econometrics, regionalization • http://pysal.org • hosted on github Copyright © 2013 by Luc Anselin, All Rights Reserved Copyright © 2013 by Luc Anselin, All Rights Reserved • PySAL Progress Report • current version is 1.6 (7th release) • 3.5 years of on-time bi-annual releases • 20,000+ downloads (10,000 in 2012) • recognized in open source scientific community - Anaconda Copyright © 2013 by Luc Anselin, All Rights Reserved Anaconda for big data analytics Copyright © 2013 by Luc Anselin, All Rights Reserved • Migrating to CyberGIS • performance = need for parallelization + refined algorithms • interoperability = provide functionality as web services • replicability: need for metadata and provenance tracking Copyright © 2013 by Luc Anselin, All Rights Reserved • Example: Spatial Weights • includes spatial data source, type of weights (e.g., contiguity, distance), any standardization or manipulation (e.g., higher order) Copyright © 2013 by Luc Anselin, All Rights Reserved • Lack of Interoperability • different implementations • no standards • duplication of efforts • hinders interoperability and workflow chaining Copyright © 2013 by Luc Anselin, All Rights Reserved • Example: Weights Formats in PySAL Copyright © 2013 by Luc Anselin, All Rights Reserved Example: PySAL spgreg what do we know about south_k6.gwt and south_ep_k20.kwt Copyright © 2013 by Luc Anselin, All Rights Reserved • Conceptual Framework • separate data source from operations • data source: polygon or coordinate files with standard metadata (projection, origin, etc.) • operations: weights metadata Copyright © 2012 by Luc Anselin, All Rights Reserved weights vocabulary Copyright © 2013 by Luc Anselin, All Rights Reserved weights metadata structure (wmd) Copyright © 2013 by Luc Anselin, All Rights Reserved • Web service implementation(OGC WPS) • wraps PySAL weights module • (re)creates weights object from information in wmd file • makes weights object available as a file Copyright © 2013 by Luc Anselin, All Rights Reserved wmd file (json) Weights Parser PySAL Dispatcher Weights Output Metadata Workflow Copyright © 2013 by Luc Anselin, All Rights Reserved Illustration Copyright © 2013 by Luc Anselin, All Rights Reserved • Generate Weights from Shapefile • NAT.shp available on server • output format = gal Copyright © 2013 by Luc Anselin, All Rights Reserved • Get Request • http://spatial.gdta.asu.edu/cgibin/wps.cgi?request=Execute&service=WPS&version=1.0.0 &identifier=weights_ws&status=false&datainputs=[outputf ormat=gal;metadata={"input1":{"type":"shp","uri":"http://t oae.org/pub/NAT.shp"},"weight_type":"rook","transform": "O", "parameters":{"p":2,"k":4}}] Copyright © 2013 by Luc Anselin, All Rights Reserved metadata input Server Response Copyright © 2013 by Luc Anselin, All Rights Reserved Sample gal output http://spatial.gdta.asu.edu/wpsoutput/e66df12 8-14ed-11e3-bde9-0050455c0671.gal Copyright © 2013 by Luc Anselin, All Rights Reserved metadata (wmd) file http://spatial.gdta.asu.edu/wpsoutput/e66df12 8-14ed-11e3-bde9-0050455c0671.wmd Copyright © 2013 by Luc Anselin, All Rights Reserved Performance Evaluation • How does PySAL scale when the amount of input data increases? • Is the overhead of web service framework acceptable? • How does the web service framework scale in handling massive concurrent requests? Copyright © 2013 by Luc Anselin, All Rights Reserved Scale-up vs. Scale-out solution • Scale-up • High-end computer • Configuration • • • Processor 2 x 2.93 GHz Quad-Core Intel Xeon Memory 16 GB 1066 MHz DDR3 ECC Software Mac OS X Lion 10.7.4 (11E53) • Scale-out: • Web server cluster Copyright © 2013 by Luc Anselin, All Rights Reserved Web Server Cluster Copyright © 2013 by Luc Anselin, All Rights Reserved • Performance • experiment using grid layout for N = 10,000 to N = 100,000 • rook contiguity and k nearest neighbors (k = 4) • input shape files on server in Utah, web service on server at ASU Copyright © 2013 by Luc Anselin, All Rights Reserved • Experiment 1 • Timing: average over 5 experiments • web server overhead, data transfer and computation • explore effect of data size Copyright © 2013 by Luc Anselin, All Rights Reserved time for rook and KNN contiguity Copyright © 2013 by Luc Anselin, All Rights Reserved • Experiment 2 • Scalability of web service framework • High-end computer (8-cores) • Cluster (4 computing nodes, each has 2core) • Total processing time • Speed up Copyright © 2013 by Luc Anselin, All Rights Reserved Total processing time Copyright © 2013 by Luc Anselin, All Rights Reserved Speed-up Copyright © 2013 by Luc Anselin, All Rights Reserved • Experiment 3 • Scalability of the cluster by adding more computing nodes • Average response time • 128 concurrent requests • Dataset: 10,000 polygons Copyright © 2013 by Luc Anselin, All Rights Reserved Scalability - cluster Copyright © 2013 by Luc Anselin, All Rights Reserved Next Steps Copyright © 2013 by Luc Anselin, All Rights Reserved Copyright © 2013 by Luc Anselin, All Rights Reserved • Towards a Standard • refine specification: flexible, expandable, deal with edge cases • improve performance (parallelization) • implement seek operations on distributed files • interoperability with other software Copyright © 2013 by Luc Anselin, All Rights Reserved Thank you!