Spatio-temporal frequent pattern mining for public safety

Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques Pradeep Mohan* Department of Computer Science University of Minnesota, Twin-Cities Advisor: Prof. Shashi Shekhar Thesis Committee: Prof. F. Harvey, Prof. G. Karypis, Prof. J. Srivastava *Contact: mohan@cs.umn.edu Biography  Education  Ph.D., Student, Department. of Computer Science and Engineering., University of Minnesota, MN, 2007 – Present.  B. E., Department. of Computer Science and Engineering, Birla Institute of Technology, Mesra, Ranchi, India. 2003-2007  Major Projects during PhD  US DoJ/NIJ- Mapping and analysis for Public Safety  CrimeStat .NET Libaries 1.0 : Modularization of CrimeStat, a tool for the analysis of crime incidents.  Performance tuning of Spatial analysis routines in CrimeStat  CrimeStat 3.2a - 3.3: Addition of new modules for spatial analysis.  US DOD/ ERDC/ TEC – Cascade models for multi scale pattern discovery  Designed new interest measures and formulated pattern mining algorithms for identifying patterns from large crime report datasets. 1 Thesis Related Publications Cascading spatio-temporal pattern discovery (Chapter 2)  P. Mohan, S.Shekhar, J.A.Shine, J.P. Rogers. Cascading spatio-temporal pattern discovery: A summary of results. In Proc. Of 10th SIAM International Conference on Data Mining 2010 (SDM 2010, Full paper acceptance rate 20%)  P. Mohan, S.Shekhar, J.A.Shine, J.P. Rogers. Cascading spatio-temporal pattern discovery. IEEE Transactions on Knowledge and Data Engineering (TKDE). (Accepted Regular Paper, In Press ~20% Acceptance Rate) Regional co-location pattern discovery (Chapter 3)  P.Mohan, S.Shekhar, J.A. Shine, J.P. Rogers, Z.Jiang, N.Wayant. A spatial neighborhood graph based approach to Regional co-location pattern discovery: summary of results. In Proc. Of 19th ACM SIGSPATIAL International Conference on Advances in GIS 2011 (ACM SIGSPATIAL 2011, Full paper acceptance rate 23%) Crime Pattern Analysis Application (Chapter 4)  2 S.Shekhar, P. Mohan, D.Oliver, Z.Jiang, X.Zhou. Crime pattern analysis: A spatial frequent pattern mining approach. M. Leitner (Ed.), Crime modeling and mapping using Geospatial Technologies, Springer (Accepted with Revisions). Outline  Introduction Motivation  Problem Statement  Our Approach  Future Work 4 Motivation: Public Safety  Crime generators and attractors Identifying events (e.g. Bar closing, football games) that lead to increased crime.  Question: What / Where are the frequent crime generators ?  Identifying frequent crime hotspots  Courtsey: www.startribune.com Predicting the next location of burglary. Law enforcement planning Question: Where are the crime hotspots ?  Predicting crime events Predictive policing (e.g. Predict next location of offense, forecast crime levels around conventions etc.)  Question: What are the crime levels 1 hour after a football game within a radius of 1 mile ? Courtsey: https://www.llnl.gov/str/September02/Hall.html Other Applications: Epidemiology 5 Scientific Domain: Environmental Criminology Crime pattern theory Routine activity theory and Crime Triangle Courtsey: http://www.popcenter.org/learning/60steps/inde x.cfm?stepnum=8 Courtsey: http://www.popcenter.org/learning/60steps/index.cfm?stepNum=16  Crime Event: Motivated offender, vulnerable victim (available at an appropriate location and time), absence of a capable guardian.  Crime Generators : offenders and targets come together in time place, large gatherings (e.g. Bars, Football games)  Crime Attractors : places offering many criminal opportunities and offenders may relocate to these areas (e.g. drug areas) 6 Outline  Introduction  Problem Statement  Spatio-temporal frequent pattern mining problem  Challenges  Our Approach  Future Work 7 Spatio-temporal frequent pattern mining problem Given :      Spatial / Spatio-temporal framework. Crime Reports with type, location and / or time. Spatial Features of interest (e.g. Bars). Interest measure threshold (Pθ) Spatial / Spatio-temporal neighbor relation. Find:  Frequent patterns with interestingness >= Pθ Objective :  Minimize computation costs. Constraints :  Correctness and Completeness.  Statistical Interpretation (i.e. account for autocorrelation or heterogeneity) 8 Illustration: Output Cascading ST Patterns (Inputs: Spatial, Temporal Neighborhood - 0.5 miles, 20 mins, Threshold - 0.5) Time T1 Time T2 > T1 Time T3>T2 Aggregate(T1,T2 ,T3) CSTP: P1 a C B Bar Closing(B) Assault(A) A Drunk Driving (C) Regional Co-location patterns (Inputs: Spatial Neighborhood – 1 mile, Threshold- 0.25) 9 Challenges Time partitioning misses relationships {Null} A  B Time T1 A C Time T2 > T1 A B C Spatio-temporal Semantics Continuity of space / time  Partial order ………. A B B.2 B B.1 C Conflicting Requirements Statistical Interpretation  Computational Scalability C A.4   B A B A.3 A C Time T3>T2 A C B A.2 B C A C.1 C C C.2 A C.3 ………. C.4 B A.5 A.1   Computational Cost A B A C B C B A C C B Space partitioning misses relationships C B A C Aggregate(T1,T2,T3) Exponential set of Candidate patterns  A.4 a B A.2 # Patterns = Exponential (# event types) B.2 C.2 A.3 B.1 A.1 C.3 10 A C.4 C.1 A.5 A Our Contributions  New Spatio-temporal frequent pattern families.  Ex: Cascading ST Patterns and Regional Co-location patterns.  Novel interest measures guarantee statistical interpretation and computable in polynomial time.  Scalable algorithms based on properties of spatio-temporal data and interest measures.  Experimental evaluation using synthetic and real crime datasets. 11 Outline  Introduction  Problem Statement  Our Approach  Big Picture  Cascading Spatio-temporal pattern discovery  Other Frequent Pattern Families  12 Future Work Cascading ST pattern (CSTP) Time T1 Time T2 > T1 Time T3>T2 Aggregate(T1,T2,T3) a Bar Closing(B) Assault(A) Drunk Driving (C)  Input: Crime reports with location and time. CSTP: P1 Output: CSTP C  Partially ordered subsets of ST event types.  Located together in space.  Occur in stages over time. 14 B A Related Pattern Semantics: ST Data mining Spatio-temporal frequent patterns Others Unordered (ST Co-occurrence) Partially Ordered Totally Ordered (ST Sequences) Our Work (Cascading ST patterns )  ST Co-occurrence [Celik et al. 2008, Cao et al. 2006]  Designed for moving object datasets by treating trajectories as location time series  Performs partitioning over space and time.  ST Sequence [Huang et al. 2008, Cao et al. 2005 ] Totally ordered patterns modeled as a chain. Does not account for multiply connected patterns(e.g. nonlinear)  Misses non-linear semantics.  No ST statistical interpretation. 16 15 Interpretation Model: Directed Neighbor Graph (DNG)  Nodes: Individual Events CSTP: P1  Directed Edge (N1  N2) iff  Neighbor( N1, N2)  and  After(N2, N1) TimeT1 C.2 C A.1 B TimeT2 A B.1 TimeT3 A.3 A.4 A.2 C.3 C.4 B.2 A.3 B.1 C.1 A.1 Bar Closing(B) 17 Assault(A) C.1 C.2 A.5 C.3 C.4 Drunk Driving (C) B.2 A.2 A.4 A.5 CSTP: P1 Statistical Foundation: Interest Measures  Instances of CSTP P1 : (BA, BC, AC) are  (B1A1, B1C1, A1C1)  (B1A3, B1C2, A3C2)  ? ?(B1A1; A1 C2; B1  C2)  Cascade Participation Ratio : CPR (CSTP, M) :  Conditional Probability of an instance of CSTP in neighborhood, given an instance of event-type M ì # instances of event - type M Î CSTP ü ý P(CSTP | M) = í î total # instances of event - type M þ  Examples 18 B A C.2 A.1 B.1 1 = 0.5 2 2 CPR(CSTP, A) = = 0.4 5 2 CPR ( CSTP , C )   0 . 5 4 CPR(CSTP,B) =  Cascade Participation Index: CPI(CSTP)  Min ( CPR(CSTP, M) ) over all M in CSTP  Example: CPI = min{CPR(CSTP,C),CPR(CSTP, A),CPR(CSTP,B)} = 0.4 C A.3 C.3 C.4 C.1 B.2 A.2 A.4 A.5 Analytical Evaluation: Statistical Interpretation Spatial Statistics: ST K-Function (Diggle et al. 1995) ^ 1 K AB (h,t)  (S.T ) 1  A  B    Iht (d(Ai,B j ),t d (Ai ,B j )) i j  Cascade Participation Index (CPI) is an upper bound to the ST K-Function per unit volume. ^ K AB (h,t) = (S.T1 )2 × ST 1 lA ×l B × å å Iht (d(Ai,B j ),t d (Ai ,B j )) i j Example: B.1 A.1 B.1 A.3 B.2 A.2 A.1 B.1 A.3 B.2 A.2 A.3 B.2 ST -K (B  A) 2/6 = 0.33 3/6 = 0.5 6/6 = 1 CPI (B  A) 2/3 = 0.66 1 1 20 A.1 A.2 Comparison with Related Interest Measures Measure Key Property Frequency  Double counting of pattern instances Maximum Independent Set (MIS) Size [Kuramochi and Karypis, 2004]  NP Complete Scoring Criterion for Bayesian Networks [Neopolitan, 2003; Chickering, 1996]  NP Complete  Learning requires Prior specification Lower bound on vertex label frequency  Frequency based interpretation. C.2 CSTP: P1 A.1 C B.1 C.3 A.3 C.4 B.2 19 C.1 A.2 A.4 A.5 B Measure Value Frequency 3 / (What is the # of transactions ?) MIS 2 Lower Bound on Frequency min{1,2,2} = 1 A Computational Structure: CSTP Miner Algorithm  Basic Idea  Initialization  for k in (1,2…3..K-1) and prevalent CSTP found do   Generate size k candidates.  Compute CSTP instances / Materialize part of DNG  Calculate interest measure and select prevalent CSTPs. end  Item sets in Association rule mining  Chemical compounds/sub graphs in graph mining.  Directed acyclic graph in CSTP mining Not part of a conventional apriori setting 21 CSTP Miner Algorithm: Illustration CPI Threshold = 0.33 {Null} A B 0 A C B A 0.4 B 0.8 C 0.75 C A C B 0.2 C.2 0 A.1 B A B C B.1 C C 0.4 A B A 0.4 A.3 0.8 C.3 C.4 C.1 B B.2 A C 0.4 A.2 A.4 Spatio-temporal join 22 A.5 Computational Structure: CSTP Miner Algorithm  Key Bottlenecks  Interest measure evaluation  Exponential pattern space  Computational Strategies  Reduce irrelevant interest measure evaluation  Filtering strategies  Compute interest measure efficiently  Time Ordered Nested Loop Strategy  Space-Time Partition Join Strategy 23 Fixed Parameters: Spatial neighborhood = 0.62 miles and temporal neighborhood = 1hr, CPI threshold = 0.0055 CSTP Miner Algorithm: Interest Measure Evaluation  ST Join Strategies: Perform each interest measure computation efficiently  Time Ordered Nested Loop (TONL) Strategy  Space-Time Partitioning (STP) Strategy = volume of ST neighborhood C.2 A.1 B.1 C.3 A.3 ST join by plane sweep Space C.4 C.1 A.5 A.2 B.2 Time 24 A.4 # Edges = 13 CSTP Miner Algorithm: Filtering Strategies Multi resolution ST Filter: Summarizing on a coarser neighborhood yields compression in most cases. Space CPI Threshold = 0.33 BA BA BC BC (0,0) B.1 A.1 (0,2) B.1 C.2 (1,0) (1,2) AC AC CA CA (1,2)(1,2) A.1 C.2 C.1(1,1)(2,0) A.5 (0,2) (1,0)(1,1) B.1 A.3 (0,0)(1,1) B.1 C.3 A.3 C.3 (1,2) B.2 A.2 B.2 C.1 B.2 A.4 0.80.8 (2,1)(2,0) A.1 C.3 (1,2)(2,1) A.3 C.4 (1,0)(2,1) 0.75 0.75 0.4 0.8 0.2 0.2 Actual Relation Coarse Relation 27 Time Experimental Evaluation :Experiment Setup Goals 1. Compare different design decisions of the CSTPM Algorithm - Performance: Run-time 2. Test effect of parameters on performance: - Number of event types, Dataset Size, Clumpiness Degree Experiment Platform: CPU: 3.2GHz, RAM: 32GB, OS: Linux, Matlab 7.9 28 Experimental Evaluation :Datasets Lincoln, NE Dataset Real Data Data size: 5 datasets  Drawn by increments of 2 months 5000- 33000 instances  Event types:  Drawn by increments of 5 event types  5 – 25 event types. Synthetic Data Data size: 5 datasets 5000- 26000 instances  Event types:  5 – 25 event types.  Clumpiness Degree:  5- 25 instances per event type per cell. 29 Experimental Evaluation: Join strategy performance Question: What is the effect of dataset size on performance of join strategies? Fixed Parameters: Real Data (CPI = 0.15, 0.31 Miles, 10 Days); Synthetic data(0.5,25,25) Trends: ST Partitioning improves performance by a factor of 5-10 on synthetic data and by a factor of 3 on real data. 30 Lincoln, NE crime dataset: Case study  Is bar closing a generator for crime related CSTP ? Bar locations in Lincoln, NE Questions  Is bar closing a crime generator ?  Are there other generators (e.g. Saturday Nights )? Observation: Crime peaks around bar-closing! Bar closing Saturday Night Increase(Larceny,vandalism, assaults) Increase(Larceny,vandalism, assaults) K.S Test: Saturday night significantly different than normal day bar closing (P-value = 1.249x10-7 , K =0.41) 35 Lincoln, NE crime dataset: Case study 36 Outline  Introduction  Problem Statement  Our Approach  Big Picture  Cascading Spatio-temporal pattern discovery  Other Frequent Pattern Families  38 Future Work Regional co-location patterns (RCP)  Input: Spatial Features, Crime Reports.  Output: RCP (e.g. < (Bar, Assaults), Downtown >)  Subsets of spatial features.  Frequently located in certain regions of a study area. 39 Statistical Foundation: Accounting for Heterogenity  Conditional probability of observing a pattern instance within a locality given an instance of a feature within that locality. Regional Participation Ratio # instances of event type M participating in PR (RCP) # instances of M in dataset 2 2 ;RPR( {ABC},PL2 ,B)  RPR(< {ABC}, PL2 >, A) = 6 4 RPR(RCP, M ) = Example RPR( {ABC},PL2 ,C)  Regional Participation index 1 4  RPI(RCP) = min{RPR(RCP, M)}  Example 2 2 1  1 RPI ( {ABC},PL2 )  min , ,  4 6 4  4 Quantifies the local fraction participating in a relationship.  40 Conclusions Proposed SFPM techniques (e.g., Cascading ST Patterns and Regional Co-location patterns) honor ST Semantics (e.g., Partial order, Continuity).  Interest measures achieve a balance between statistical interpretation and computational scalability.  Algorithmic strategies exploiting properties of ST data (e.g., multiresolution filter) and properties of interest measures enhance computational savings. 42 Future Work – Short and Medium Term X: Unexplored Input Data Spatial Spatio-temporal (ST) Unordered ✔ ✔ Totally Ordered X ✔ Partially Ordered X CSTP discovery Statistical Foundation Autocorrelation ✔ CSTP discovery Heterogeneity RCP Discovery X Underlying Framework Euclidean RCP Discovery CSTP discovery Non-Euclidean (Networks) X X Neighbor Relation User specified RCP Discovery CSTP discovery Algorithm Determined X X Interestingness Criterion Interest measure threshold RCP Discovery CSTP discovery Threshold free X X Type of data Boolean / Categorical RCP Discovery CSTP discovery Quantitative data (e.g., Climate) X X Pattern Semantics 43 Future Work – Long Term  Exploring interpretation of discovered patterns by law enforcement.  ST Predictive analytics, Predictive models based on SFPM and Predictive policing.  Towards Geo-social analytics for policing (e.g. Criminal Flash mobs, gangs, groups of offenders committing crimes)  New ST frequent pattern mining algorithms based on depth first graph enumeration.  ST frequent pattern mining techniques that account for patron demographic levels.  Explore evaluation of choloropeth maps via ST frequent pattern mining. 43 Acknowledgment  Members of the Spatial Database and Data Mining Research Group University of Minnesota, Twin-Cities.  This Work was supported by Grants from U.S.ARMY, NGA and U.S. DOJ.  Advisor: Prof. Shashi Shekhar, Computer Science, University of Minnesota.  Thesis committee.  U.S. DOJ – National Institute of Justice: Mr. Ronald E. Wilson (Program Manager, Mapping and Analysis for Public Safety) , Dr. Ned Levine (Ned Levine and Associates, CrimeStat Program)  U.S. Army – Topographic Engineering Center: Dr. J.A.Shine (Mathematician and Statistician, Geospatial Research and Engineering Division ) and Dr. J.P. Rogers (Additional Director, Topographic Engineering Center)  Mr. Tom Casady, Public Safety Director (Formerly Lincoln Police Chief), Lincoln, NE, USA Thank You for your Questions, Comments and Attention! 44

Spatio-temporal frequent pattern mining for public safety

Related documents

Products

Support

Spatio-temporal frequent pattern mining for public safety

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib