Integrating Geographical Information Systems and Grid Applications Marlon Pierce Contributions: Ahmet Sayar, Galip Aydin, Mehmet Aktas, Harshawardhan Gadgil Community Grids Lab Indiana University Acknowledgements The real work was done by (in alphabetical order). Project web site: Mehmet Aktas Galip Aydin Harshawardhan Gadgil Ahmet Sayar http;//www.crisisgrid.org This work was supported by NASA AIST as part of “SERVOGrid: Complexity Computational Environment” Geographical Information Systems and Grid Applications Pattern Informatics Regularized Dynamic Annealing Hidden Markov Method (RDAHMM) Time series analysis code by Dr. Robert Granat (JPL). Can be applied to GPS and seismic archives. Can be applied to real-time data. Interdependent Energy Infrastructure Simulation System (IEISS) GeoFEST Earthquake forecasting code developed by Prof. John Rundle (UC Davis) and collaborators. Uses seismic archives. Finite element method code developed by Dr. Jay Parker (JPL) and Prof. Greg Lyzenga (JPL/Harvey Mudd College) Uses fault models as input. Virtual California Prof. Rundle’s UC-Davis group Used for forecasting Uses fault and fault friction input GIS Data Grid Work at CGL We decided that the Data Grid components of SERVO is best implemented using standard GIS services. Use Open Geospatial Consortium standards Provide downloadable GIS software to the community as a side effect of SERVO research. We implemented two cornerstone standards as Web Services (WS-I+ approach) Web Feature Service (WFS): data service for storing abstract map features Web Map Service (WMS): generate interactive maps from WFS’s and other WMS’s. Can be used to set up problems by extracting features (faults, seismic events, etc) from user GUIs to drive problems such as the PI code and (in near future) GeoFEST, VC. We also built a GIS compatible UDDI and WS-Context Supports queries Faults, GPS, seismic records Browse capabilities files. We are currently working on these steps Improving WFS performance Integrating WMS with video streaming technologies. Implementing Sensor Web Enablement for streaming, real-time data. GIS and Sensor Grids OGC has defined a suite of data structures and services to support Geographical Information Systems and Sensors GML Geography Markup language defines specification of geo-referenced data SensorML and O&M (Observation and Measurements) define meta-data and data structure for sensors Services like Web Map Service, Web Feature Service, Sensor Collection Service define services interfaces to access GIS and sensor information Grid workflow links services that are designed to support streaming input and output messages We are building Grid (Web) service implementations of these specifications for NASA’s SERVOGrid A Screen Shot From the WMS Client WMS uses WFS that uses data sources <gml:featureMember> <fault> <name> Northridge2 </name> <segment> Northridge2 </segment> <author> Wald D. J.</author> <gml:lineStringProperty> <gml:LineString srsName="null"> <gml:coordinates> -118.72,34.243 118.591,34.176 </gml:coordinates> </gml:LineString> </gml:lineStringProperty> </fault> </gml:featureMember> ` WMS le ec tio n Fe a ol tur eC eC oll Ge tF ea e r tu r tu a Fe a Fe et G tur e Client io ct n s ad i l ro ] a R [a-b Railroads WFS Server Hi River [a-d] Bridge [1-5] ry SQL Query ue LQ SQ SQ L gw ay [1 2- Q ue 18 ry ] Interstate Highways Rivers Bridges 90 Automating Pattern Informatics Pattern Informatics (PI) PI is a technique developed at University of California, Davis for analyzing earthquake seismic records to forecast regions with high future seismic activity. They have correctly forecasted the locations of 15 of last 16 earthquakes with magnitude > 5.0 in California. See Tiampo, K. F., Rundle, J. B., McGinnis, S. A., & Klein, W. Pattern dynamics and forecast methods in seismically active regions. Pure Ap. Geophys. 159, 2429-2467 (2002). http://citebase.eprints.org/cgibin/fulltext?format=application/pdf&identifier=oai%3AarXiv.org% 3Acond-mat%2F0102032 PI is being applied other regions of the world, and John has gotten a lot of press. Google “John Rundle UC Davis Pattern Informatics” Pattern Informatics in a Grid Environment PI in a Grid environment: Hotspot forecasts are made using publicly available seismic records. Code location is unimportant, can be a service through remote execution Results need to be stored, shared, modified Grid/Web Services can provide these capabilities Problems: Southern California Earthquake Data Center Advanced National Seismic System (ANSS) catalogs How do we provide programming interfaces (not just user interfaces) to the above catalogs? How do we connect remote data sources directly to the PI code. How do we automate this for the entire planet? Solutions: Use GIS services to provide the input data, plot the output data Web Feature Service for data archives Web Map Service for generating maps Use HPSearch tool to tie together and manage the distributed data sources and code. Example of Data Mining and GIS Grid Databases with NASA, USGS features SERVOGrid Faults WFS1 UDDI Data Mining Grid WFS3 WFS2 NASA WMS WMS handling Client requests SOAP WMS WMS Client Client HTTP Web Map Client WSDL Aggregating WMS Stubs Stubs HTTP SOAP WSDL WSDL WFS + Seismic Rec. WFS + State Bounds “REST” … WMS + OnEarth WMS uses WFS that uses data sources <gml:featureMember> <fault> <name> Northridge2 </name> <segment> Northridge2 </segment> <author> Wald D. J.</author> <gml:lineStringProperty> <gml:LineString srsName="null"> <gml:coordinates> -118.72,34.243 118.591,34.176 </gml:coordinates> </gml:LineString> </gml:lineStringProperty> </fault> </gml:featureMember> ` WMS le ec tio n Fe a ol tur eC eC oll Ge tF ea e r tu r tu a Fe a Fe et G tur e Client io ct n s ad i l ro ] a R [a-b Railroads WFS Server Hi River [a-d] Bridge [1-5] ry SQL Query ue LQ SQ SQ L gw ay [1 2- Q ue 18 ry ] Interstate Highways Rivers Bridges 90 GIS Behind the Scenes The web features are served up by a Web Feature Service. Web Map Service aggregates maps We re-implement Open Geospatial Consortium standards using Web Service Standards. http://grids.ucs.indiana.edu/ptliupages/publications/acm-gis-sayar.pdf. http://grids.ucs.indiana.edu/ptliupages/publications/Geoinformatics05_asayar.pd f. More WFS Info: SOAP messages, WSDL service definitions. Will allow us to separate messages from HTTP transport layer in future. More WMS Info: NASA OnEarth + our own renderings. http://grids.ucs.indiana.edu/ptliupages/publications/gwpap243.pdf More general info, software, demos: http://www.crisisgrid.org Tying It All Together: HPSearch HPSearch is an engine for orchestrating distributed Web Service interactions It uses an event system and supports both file transfers and data streams. Legacy name HPSearch flows can be scripted with JavaScript HPSearch engine binds the flow to a particular set of remote services and executes the script. HPSearch engines are Web Services, can be distributed interoperate for load balancing. Boss/Worker model ProxyWebService: a wrapper class that adds notification and streaming support to a Web Service. More info: http://www.hpsearch.org Data Mining Grid from Grid of Grids Databases with NASA,USGS features SERVOGrid Faults UDDI WFS4 SOAP Filter Pipeline PI Data Mining HPSearch “Workflow” Filter WS-Context Narada Brokering System Services WFS3 GIS Grid Traditional Execution Grid Data can be stored and retrieved from the 3rd part repository (Context Service) WS Context WFS (Tambora) (Gridfarm001) NaradaBroker network: Used by HPSearch engines as well as for data transfer WMS Data Filter HPSearch (Danube) (TRex) Virtual Data flow WMS submits script execution request (URI of script, parameters) HPSearch hosts an AXIS service for remote deployment of scripts PI Code Runner (Danube) Accumulate Data Run PI Code Create Graph Convert RAW -> GML HPSearch (Danube) GML (Danube) Actual Data flow HPSearch controls the Web services Final Output pulled by the WMS HPSearch Engines communicate using NB Messaging infrastructure IEISS GUI FOR OVERLAYING OUTAGE AREA ON A MAP IEISS Summary IEISS simulates power outages resulting from damage to electrical and natural gas grids. GIS Grid integration is similar to earlier PI application. Primary differences: Better support for dynamic GIS service discovery. Better integration of distributed state monitoring (WS-Context). Google map clients as well as modified PI clients. 4-5 - WFS publishes the as GML document into a topic WFS 1-2-3 6 - User and - WMS invokes WMS Client publish IEISS ->results through WMS their Server WSDL WMSFeatureCollection -> Client URL UDDI to interface -> the WFS UDDI for the Registry obtained (“/NISAC/WFS”) in a pub/sub based messaging WFS session -> WMS in Server geospatial features, and WMS Client starts system. a workflow the (creates map overlay) and IEISS receive this GML document. WMS Server -> Context aService. WMS Client (displays it) 7 - On receiving invocation message, IEISS updates the shared state data to be “IEISS_IS_IN_PROGRES”. IEISS runs and produces an ESRI Shape file and then invokes shp2gml tool to convert produced Shape file to GML format. After the conversion IEISS updates shared session state to be “IEISS_COMPLETED”. As the state changes, the Context Service notifies all interested workflow entities such as WMS Client. 9-10 - WFS-L publishes the IEISSWMS output as amakes GML FeatureCollection 8 – On receiving the notification, Client a request to the document topicoutput ‘NISAC/WFS-L’. WMS Server is subscribed to WFS-L for to theNB IEISS this topic and receives the GML file then converts it to map overlay, and the Client displays the new model on the map. Electric Power and Natural Gas data Zoom-in Zoom-out FeatureInfo mode Measure distance mode Clear Distance Drag and Drop mode Refresh to initial map Overlaid Outage Area - I Basic Steps: Select Energy Power AND Natural Gas Data and Update Layer List rendered on the map Click on “Overlay Outage” button See the outage area on the map Overlaid Outage Area - II Basic Steps: Select Energy Power Data and Update Layer List rendered on the map Click on “Overlay Outage” button Use zoom-in mapping tool below to get same outage area in more detail See the outage area on the map Overlaid Outage Area - III Basic Steps: Select Energy Power and Natural Gas Data and Update Layer List rendered on the map Select St. Petersburg from the “Area of Interest” dropdown list. Click on “Overlay Outage” button. See the outage area on the map Getting Info about specific EP Data by clicking on the map Basic Steps: Select Energy Power Data and Update Layer List rendered on the map Select (i) from the mapping tools below. Click on any feature data on the map. See the information for selected feature in pop-up window Google Hybrid Map and Feature Information call to WMS Natural Gas Layer Electric Power Layer Google Map Client Archived Real Time Databases with SERVOGrid Faults Sensor Grid WFS1 WFS2 Google Central HTTP Google Map Client Helper Services UDDI SOAP DoD and Homeland Security can in a crisis combine custom geo-referenced data with that available from hundreds of thousands of computers from Microsoft, Yahoo and Google Just build simple services using Interoperability standards! Real Time GPS and Google Maps Subscribe to live GPS station. Position data from SOPAC is combined with Google map clients. Select and zoom to GPS station location, click icons for more information. Google maps can be integrated with Web Feature Service Archives to filter and browse seismic records. Integrating Archived Web Feature Services and Google Maps Google Maps as Service accessed from our WMS Client Support for Real Time Applications RDAHMM: GPS Time Series Segmentation Slide Courtesy of Robert Granat, JPL GPS displacement (3D) length two years. Divided automatically by HMM into 7 classes. Features: • Dip due to aquifer drainage (days 120250) • Hector Mine earthquake (day 626) • Noisy period at end of time series Complex data with subtle signals is difficult for humans to analyze, leading to gaps in analysis HMM segmentation provides an automatic way to focus attention on the most interesting parts of the time series Towards Real-Time RDAHMM A real-time version of RDHAMM could potentially be used to detect state change events in live data from a GPS station. SCIGN maintains 125+ GPS stations, so trivially parallel RDAHHM clones can monitor state changes in the entire network. HPSearch can help But first we must get the data to RDAHMM. NaradaBrokering: Message Transport for Distributed Services NB is a distributed messaging software system. NB system virtualizes transport links between components. http://www.naradabrokering.or g Supports TCP/IP, parallel TCP/IP, UDP, SSL. See e.g. http://grids.ucs.indiana.edu/ptli upages/publications/AllHands2 005NB-Paper.pdf for transAtlantic parallel tcp/ip timings. Traditional NaradaBrokering Features Multiple protocol transport support In publish-subscribe Paradigm with different Protocols on each link Transport protocols supported include TCP, Parallel TCP streams, UDP, Multicast, SSL, HTTP and HTTPS. Communications through authenticating proxies/firewalls & NATs. Network QoS based Routing Allows Highest performance transport Subscription Formats Subscription can be Strings, Integers, XPath queries, Regular Expressions, SQL and tag=value pairs. Reliable delivery Robust and exactly-once delivery in presence of failures Ordered delivery Producer Order and Total Order over a message type. Time Ordered delivery using Grid-wide NTP based absolute time Recovery and Replay Recovery from failures and disconnects. Replay of events/messages at any time. Buffering services. Security Message-level WS-Security compatible security Message Payload options Compression and Decompression of payloads Fragmentation and Coalescing of payloads Messaging Related Compliance Java Message Service (JMS) 1.0.2b compliant Support for routing P2P JXTA interactions. Grid Feature Support NaradaBrokering enhanced Grid-FTP. Bridge to Globus GT3. Web Services supported Implementations of WS-ReliableMessaging, WS-Reliability and WS-Eventing. Transit Delay (Milliseconds) Mean transit delay for message samples in NaradaBrokering: Different communication hops 9 8 7 6 5 4 3 2 1 0 hop-2 hop-3 hop-5 hop-7 100 1000 Message Payload Size (Bytes) Pentium-3, 1GHz, 256 MB RAM 100 Mbps LAN JRE 1.3 Linux Standard Deviation for message samples in NaradaBrokering Different communication hops - Internal Machines 0.8 hop-2 hop-3 hop-5 hop-7 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1000 1500 2000 2500 3000 3500 Message Payload Size (Bytes) 4000 4500 5000 Typical use of Grid Messaging Filter or Datamining Sensor Grid Post before Processing Database Archives Post after Processing Narada Brokering Web Feature Service Notify Subscribe HPSearch Manages WS-Context Stores dynamic data WFS (GIS data) GIS Grid Geographical Information System Raw to GML via NaradaBrokering The Scripps Orbit and Permanent Array Center (SOPAC) GPS station network data published in RYO format is converted to ASCII and GML Typical use of Grid Messaging in NASA Sensor Grid Grid Eventing Datamining Grid GIS Grid GIS and Collaboration Tools e-Annotation Player Archieved stream list Archived stream player Real time stream list Annotation/WB player Real time stream player e-Annotation Whiteboard GIS and Collaboration The previous slide illustrates an initial interface for capturing, annotating, and storing/replaying video streams. Still images can be captured and annotated on shared white board. Annotations are stored along with rest of system. Collaborative and Synchronous Annotation & Discussion Collaborative Communication e tiv n ora icatio b lla n Co mmu Co Streaming Servers Master/Coach broker broker archive NaradaBrokering Student broker broker broker e-Annotation Portal Server broker Student Ar Collaborative Communication Collaborative e-Annotation Player Student ch iv e/R etr iev e G L O B A L M M C S m trea TV Live stre am Capture Device Archived Real Time (Live) Stream From TV and Capture Devices Collaborative e-Annotation Whiteboard Archived Streams Stream Annotation Snapshots Instant Messenger Real Time (Live) Stream Player s TV Storage Servers Challenges for Geographical Information System Grids Must address performance issues. Related workshop at GGF 15. HTTP is not an adequate transport mechanism for moving data around. XML representations, compression, etc. Well established techniques from real-time collaboration can be applied to sensors Stream archiving and playback, session management, software multicasting. Applies to both data streams (GPS) and maps (streaming video).