A K-Main Routes Approach to Spatial Network Activity Summarization Authors: Dev Oliver Shashi Shekhar James M. Kang Renee Bousselaire Abdussalam Bannur Outline Motivation Problem Statement Contributions Validation Analytical Experimental Case Studies Summary and Future Work Motivation: Crime Analysis (application domain) Crime hotspot Street Place Area of concentrated crime Neighborhood “Most clustering algorithms will show areas of concentration even when a line is the most appropriate dimension.” – National Institute of Justice** Star Tribune, January 26, 2011 **J. E. Eck et. al. Mapping Crime: Understanding Hot Spots. US National Inst. of Justice (http://www.ncjrs.gov/pdffiles1/nij/209393.pdf), 2005. Examples of Linear Patterns Linear patterns resulting from deforestation in Brazil http://en.wikipedia.org/wiki/Deforestation_in_Brazil Linear patterns of crime in a major US city Motivation: Environmental Criminology (scientific domain) Spatial theories in Environmental Criminology Routine Activity Theory1 Crime location related to criminal’s frequently visited areas Crime Pattern Theory2 Based on spatial model Nodes (e.g. home, work, entertainment), Paths (e.g. routes between nodes), Edges Crime locations close to edges Near criminal’s activity boundaries where residents may Source: Rossmo, Kim (2000). Geographic Profiling. Boca Raton, FL: CRC Press. not recognize him/her http://www.popcenter.org/learning/60steps/index.cfm?stepNum=16 Network based summarization adds value to Environmental Criminology Assist with large scale verification of real-world data matching theories Opportunities to develop hypotheses for new theory formulation 1L.E. 2P. Cohen et al., Social change and crime rate trends: A routine activity approach, American sociological review, 1979. L. Brantingham et al., Environmental Criminology, Waveland Press, 1990. Other Domains Disaster Relief Accident Analysis and Prevention Key Concepts Activity Object of interest located at node or edge Summary path A path chosen by KMR to summarize activities Activity coverage Total number of activities of a path or set of paths Active node A node having n ≥ 1 activities or joined by an edge having n ≥ 1 activities e.g., A, B, C, D, E Inactive node A node having n = 0 activities and joined by edges all having n = 0 activities e.g., F Active node ratio Total # active nodes/Total # nodes e.g., 5/6 Motivation Problem Contributions Validation Summary Each edge has a weight of 1 Problem Statement Given P = the set of Shortest Paths Given A spatial network G = (N, E) A set of activities, A and their locations (e.g. a node or edge) A set of Paths, P K (Number of routes) Edge weights k=2 Edge Weights are 1 Find A cardinality k subset P′ of P, i.e., a subset P′⊆ P with |P′| = k Objective Maximize the activity coverage (AC) by P′ Constraints 1 ≤ k ≤ |P|. Motivation Problem Contributions Validation Summary Challenges Measures of interestingness Activity coverage, average distance, etc Computational Complexity Choose(N,2) paths, given N nodes Exponential number of k subsets of paths Motivation Problem Contributions Validation Summary Related Work Network Summarization by Grouping/Clustering Zero or One routes Clumping (Okabe), e.g. NT-VCM (Shiode) Motivation Problem Contributions Multiple routes Max. Subgraph, e.g. path, tree (Buchin) Validation Summary Our Work Contributions K-Main Routes (KMR) algorithm Finds a set of k routes to group activities New design decisions added Network Voronoi Activity assignment Divide and Conquer Summary path recomputation Spatial network activity summarization is shown to be NP-complete. Analytically demonstrate correctness of design decisions and show cost analysis Experimental evaluation of the various algorithms Performance evaluated using synthetic and real world datasets Case study comparing KMR with geometry based summarization Motivation Problem Contributions Validation Summary K-Main Routes (KMR) Algorithm K-Main Routes Algorithm Select k paths as initial summary paths Repeat 1. Form k clusters by assigning each activity to its closest summary path 2. Recompute summary path of each cluster Until summary paths do not change Design Decisions Inactive node pruning Network Voronoi Activity assignment Divide and Conquer Summary path recomputation Motivation Problem Contributions Validation Summary P = the set of Shortest Paths, K=2 Design Decision: Inactive Node Pruning Only consider paths between active nodes Optimal solution will still be in this set Given the set of shortest paths • 20 shortest paths calculated and stored versus 30 Motivation Problem Contributions Validation Summary Design Decision: Network Voronoi (NV) Activity Assignment Goals Form k clusters by assigning each activity to its closest summary path Improve execution time of current assignment strategy Example (execution trace) Next K-Main Routes Algorithm Select k shortest paths as initial summary paths Repeat 1. Form Network k clusters VoronoibyActivity assigning Assignment each activity to its closest summary path 2. Recompute summary path of each cluster Until summary paths do not change Motivation Problem Contributions Validation Summary Design Decision: Network Voronoi (NV) Activity Assignment 0 X Open: X A E D H ∞ 0 A B Closed: X ∞ 3 4 C 7 8 D 1 9 2 10 E 5 6 F G ∞ ∞ Activity Active Node Inactive Node Virtual Node Motivation Problem H ∞ 0 ACTIVITIES 1 ∞ 0 Summary Path Edge weight = 1 Edge weight = 0 Closed Node Contributions DISTANCE FROM ∞ 0 ∞ Validation A E D H AE DH Summary 2 3 4 5 6 7 8 9 10 Design Decision: Network Voronoi (NV) Activity Assignment 0 X Open: A E D H B 1 ∞ A B Closed: X A ∞ 3 4 C 7 8 D 1 9 2 10 5 E 0 1 < 0? 6 F G ∞ ∞ Activity Active Node Inactive Node Virtual Node Motivation Problem H 0 ACTIVITIES 0 Summary Path Edge weight = 1 Edge weight = 0 Closed Node Contributions DISTANCE FROM 0 Validation 1 2 0 0 AE 0 0 A E D H DH Summary 3 4 5 6 7 8 9 10 Design Decision: Network Voronoi (NV) Activity Assignment 0 X Open: E D H B F A B 3 4 C 7 8 D 1 9 2 10 E 5 0 6 F G ∞ 1 ∞ Activity Active Node Inactive Node Virtual Node Motivation Problem H 0 ACTIVITIES 0 Summary Path Edge weight = 1 Edge weight = 0 Closed Node Contributions DISTANCE FROM 0 Closed: X A E ∞ 1 Validation 1 2 3 4 5 6 A 0 E 0 0 0 0 0 AE 0 0 0 0 D H DH Summary 7 8 9 10 Design Decision: Network Voronoi (NV) Activity Assignment 0 X Open: D H B F C A B Closed: X A E D ∞ 1 1 3 4 C 7 8 D 1 9 2 10 E 5 0 6 F G 1 ∞ Activity Active Node Inactive Node Virtual Node Motivation Problem 0 H 0 1 < 0? Summary Path Edge weight = 1 Edge weight = 0 Closed Node Contributions ACTIVITIES DISTANCE FROM 0 Validation A 1 2 0 0 E 3 4 5 0 6 7 8 9 10 0 0 0 0 0 0 0 0 0 D H AE 0 DH Summary 0 0 0 Design Decision: Network Voronoi (NV) Activity Assignment 0 X Open: H B F C G 1 A B Closed: X A E D H 1 3 4 C 7 8 D 1 9 2 10 E 5 0 6 F G 1 ∞ 1 Activity Active Node Inactive Node Virtual Node Motivation Problem H 0 ACTIVITIES 0 Summary Path Edge weight = 1 Edge weight = 0 Closed Node Contributions DISTANCE FROM 0 Validation A 1 2 0 0 E 3 4 5 0 6 7 8 9 10 0 0 0 0 0 0 0 0 D 0 H AE 0 DH Summary 0 0 0 0 0 Design Decision: Network Voronoi (NV) Activity Assignment 0 X Open: B F C G 1 A B 3 4 C Closed: X A E D H B 7 8 D 1 9 2 10 E 5 0 6 F G 1 2 < 1? 1 Activity Active Node Inactive Node Virtual Node Motivation Problem H 0 ACTIVITIES 0 Summary Path Edge weight = 1 Edge weight = 0 Closed Node Contributions DISTANCE FROM 0 2 < 1? 1 Validation A 1 2 3 4 0 0 1 1 E 5 0 6 7 8 9 10 0 0 0 0 0 0 0 0 D 0 H AE 0 DH Summary 0 1 1 0 0 0 0 Design Decision: Network Voronoi (NV) Activity Assignment 0 X Open: F C G A B Closed: X A E D H B F 1 1 3 4 C 7 8 D 1 9 2 10 E 5 0 6 F 1 Activity Active Node Inactive Node Virtual Node Motivation Problem G H 0 ACTIVITIES 0 1 2 < 1? Summary Path Edge weight = 1 Edge weight = 0 Closed Node Contributions DISTANCE FROM 0 Validation A 1 2 3 4 0 0 1 1 E 5 0 6 7 8 9 10 0 0 0 0 0 0 0 0 D 0 H AE 0 DH Summary 0 1 1 0 0 0 0 Design Decision: Network Voronoi (NV) Activity Assignment 0 X Open: C G 1 A B Closed: X A E D H B F C 1 3 4 C 7 8 D 1 9 2 10 E 5 0 6 F 1 Activity Active Node Inactive Node Virtual Node Motivation Problem G H 0 ACTIVITIES 0 1 2 < 1? Summary Path Edge weight = 1 Edge weight = 0 Closed Node Contributions DISTANCE FROM 0 Validation A 1 2 3 4 0 0 1 1 E 5 0 D 1 6 7 8 9 10 0 0 0 0 0 0 0 0 0 1 H AE 0 DH Summary 0 1 1 1 1 0 0 0 0 Design Decision: Network Voronoi (NV) Activity Assignment Network Voronoi Activity Assignment algorithm Input: Graph G = (N, E), a set of Activities A, a set of k Summary Paths, S Output: A set of k clusters formed by assigning all ai ∈A to one si ∈S, where dist(ai, si) ≤ dist(ai, sj) and sj ∈S and sj ≠ si 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. Motivation Open ← all nodes ∈ S, Closed ← Ø Tnodes ← all nodes ∈ S, Tactivities ← activities on si ∈S repeat nc ← next node ∈ Open remove nc from Open Closed ← nc X ← neighbors of nc foreach xi ∈ X if xi ∉ Tnodes and xi ∉ Closed Tnodes ← xi xi.prev ← nc, xi.dist ← dist(xi, nc) + nc.dist xi.sp ← nc.sp else if xi ∈Tnodes update xi if new dist < xi.dist Problem Contributions Validation 17. if xi ∉ Open 18. Open ← xi 19. Y ← activities on edge {nc, xi} 20. foreach yi ∈ Y 21. if yi ∉ Tactivities 22. Tactivities ← yi 23. yi.prev ← nc 24. yi.dist ← xi.dist 25. yi.sp ← xi.sp 26. else 27. update yi if new dist < yi.dist 28. until all active nodes ∈ Closed 29. return currentClusters Summary Design Decision: Divide and Conquer Summary PAth REcomputation Goals Recompute the summary path of each cluster Improve execution time of current recomputation strategy Example (execution trace) Next K-Main Routes Algorithm Select k shortest paths as initial summary paths Repeat 1. Network Voronoi Activity Assignment 2. Recompute Divide and Conquer summarySummary path of each pathcluster Recomputation Design Until summary paths do not changeDecision Until summary paths do not change Motivation Problem Contributions Validation Summary Design Decision: Divide and Conquer Summary PAth REcomputation Summary Path Recomputation Algorithm Input: Graph G = (N, E), a set of Clusters, C Output: A set of summary paths, S where si ∈S has max coverage for ci ∈ C and si ∈ ci 1. nextClusters ← Ø 2. foreach ci ∈ C 3. X ← active nodes of ci 4. maxP ← Ø 5. foreach xi ∈ X 6. foreach xj ∈ X 7. if (i ≠ j) 8. cP ← getSP(xi, xj) 9. if (maxP = Ø) 10. maxP ← cP 11. if (maxP.activities < cP.activities) 12. maxP ← cP 13. if (maxP ≠ ci.summaryPath 14. nextClusters ← maxP 15. else 16. nextClusters ← ci.summaryPath 17. return nextClusters Motivation Problem Contributions Validation Summary A 3 4 B C 7 8 D 1 9 2 10 E 5 6 F G Activity Active Node Inactive Node Summary Path Edge weights are 1 Cluster H Validation Analytical Cost analysis explaining computational savings Experimental Comparative analysis of KMR with various design decisions Performed on real and synthetic data Network voronoi activity assignment and divide and conquer summary path recomputation saves computational costs Savings increase with number of nodes, routes, activities and active node ratio Case studies Qualitatively shows the usefulness of network based summarization on Crime data Motivation Problem Contributions Validation Summary Analytical Evaluation: Computational Analysis KMR Execution Time = Number of Iterations × (Activity Assignment Cost + Summary Path Recomputation Cost) TKMR = I × ([K × |A| × cost(ai,ci)] + [K × dc × |N|2]) TKMR_I = I × ([K × |A| × cost(ai,ci)] + [K × dc × (|N| × r)2]) TKMR_IAS = I × ([|E| + |N|×log |N|] + [K × dc × (|N|/K × r)2]) I = Number of Iterations K = Number of Clusters A = Set of activities cost(ai, ci) = Cost of calculating the distance between activity ai and cluster ci dc = Cost of looking up a path N = Set of Nodes E = Set of Edges r = active node ratio, 0 ≤ r ≤ 1 Motivation Problem Contributions Validation Summary Experimental Evaluation Variables #Nodes Synthetic Dataset #Routes Java-based Simulator #Activities Active Node Ratio • • Motivation Candidates KMR_I KMR_IV Measures KMR_ID Analysis KMR_IVD Goal: Comparative analysis Candidates: KMR with various design decisions • • • • • • • • • Real Dataset KMR_I – KMR with inactive node pruning KMR_IV – KMR with inactive node pruning and Network voronoi activity assignment KMR_ID – KMR with Divide and conquer summary path recomputation KMR_IVD – KMR with all three design decisions Measure: CPU time (Unix time command) Platform: Mac Pro, 2 x Xeon Quad Core 2.26 GHz, 16 GB RAM Variables: #Nodes, #Routes, #Activities, Active Node Ratio Fixed Parameters: unit edge length Datasets: Synthetic and Real (Haiti Earthquake) Problem Contributions Validation Summary Data Description and Characteristics Synthetic Data 2010 Census TIGER/Line® Shapefiles used for road network Activities randomly assigned to each edge Real-world data: Haiti Data Set Geospatial and Temporal Dataset describing recent events post-disaster Dataset collected from Jan 12, 2010 to March 23, 2010 1,677 records Characteristics Attributes • Incident Title (e.g., “Food, Water, Tents needed…”) • Incident Date and Time • Location (City, port name) • Category (numeric category) • Latitude/Longitude Sources Crisis Map of Haiti - http://haiti.ushahidi.com/ OpenStreetMap - http://www.openstreetmap.org/ Motivation Problem Contributions Validation Summary Effect of Number of Nodes Synthetic Data Set Number of Activities = 1200 Active Node Ratio = 0.2 K=2 Real Data Set Number of Activities = 1206 Active Node Ratio = 0.1998 K=2 Trends: Voronoi Activity assignment and divide and conquer summary path recomputation saves comp. costs Savings increase with number of nodes Motivation Problem Contributions Validation Summary Effect of Number of Routes, K Synthetic Data Set Number of Nodes = 1000 Number of Activities = 1200 Active Node Ratio = 0.2 Real Data Set Number of Nodes = 1000 Number of Activities = 202 Active Node Ratio = 0.219 Trends: Voronoi Activity assignment and divide and conquer summary path recomputation saves comp. costs Savings increase with number of routes Motivation Problem Contributions Validation Summary Effect of Number of Activities Synthetic Data Set Number of Nodes = 1000 Active Node Ratio = 0.2 K=2 Trends: Voronoi Activity assignment and divide and conquer summary path recomputation saves comp. costs Savings increase with number of activities Motivation Problem Contributions Validation Summary Effect of Active Node Ratio Synthetic Data Set Number of Nodes = 1000 Number of Activities = 1200 K=2 Trends: Voronoi Activity assignment and divide and conquer summary path recomputation saves comp. costs Savings increase with active node ratio Motivation Problem Contributions Validation Summary Case Study: Crime Analysis Input (a set of crime incidents, k=5) Crimestat K-Means (Euclidean distance) KMR Output Crimestat K-Means (Network distance) Case Study: Crime Analysis Input (a set of crime incidents, k=5) Crimestat K-Means (Euclidean distance) KMR Output Crimestat K-Means (Network distance) Case Study: Crime Analysis Input (a set of crime incidents, k=5) Crimestat K-Means (Euclidean distance) KMR Output Crimestat K-Means (Network distance) Summary Spatial network activity summarization was shown to be NP-complete. K-Main Routes (KMR) algorithm and its design decisions described Inactive node pruning Network Voronoi Activity assignment Divide and Conquer Summary path recomputation Analytically demonstrated correctness of design decisions and cost analysis showed Experimental evaluation Performance evaluated using synthetic and real world datasets Case study comparing KMR with geometry based summarization Motivation Problem Contributions Validation Summary Acknowledgements Members of the Spatial Database and Spatial Data Mining Research Group, University of Minnesota, Twin-Cities. This work was supported by grants from USARMY and USDOD. Thank you for your time! Any questions or comments?