A new metric of crime hotspots for Operational Policing Monsuru Adepeju*1, Tao Cheng†1, John Shawe-Taylor‡2, Kate BowersβΈ3 1 SpaceTimeLab for Big Data Analytics, Department of Civil, Environmental and Geomatic Engineering, University College London 2 Department of Security and Crime Science, University College London 3 Department of Computer Science, University College London November 07, 2014 Summary This study examines the existing metrics used in evaluating the effectiveness of area-based crime hotspots for operational policing. We identified some of the limitations of the metric (i.e. Area-toPerimeter (AP) ratio) used for measuring compactness of hotspots and then proposed a new improved metric called “Clumpiness Index (CI)”. The case study of London Metropolitan police crime dataset features the prediction of 3 different crime types using two different crime predictive methods. The effectiveness of the hotspots was then measured using both AP ratio and CI. The comparison of the results clearly shows that CI is a better metric for measuring the effectiveness of crime hotspots for operational policing. KEYWORDS: Effective hotspots, Area-to-perimeter (AP) ratio, Hit rate, Clumpiness Index (CI), Operational Policing. 1. Introduction As police resources are becoming increasingly limited due partly to budget constraints, there is a growing interest in strategies that can enhance optimisation of their resources towards achieving the desired crime prevention goals. An effective policing strategy is one that offers the police high crime prevention potential with a small amount of police deployment (Weisburd, 2008). Over the last decade, attempts to increase police effectiveness have resulted in operational policing being informed by predictive analysis of crime. The predictive methods are used to identify locations of high future crime risk. These locations are referred to as crime hotspots. The types of hotspots include pointbased, network-based and area-based hotspots. Several studies have suggested that police can be more effective in intervening in crime by focussing on small geographical units with high crime rates (hotpots) rather than actual people (offenders) committing the crimes (Telep & Weisburd, 2014). As a result, most predictive methods of crime have been aimed at identifying area-based hotspots. To estimate how effective the detected hotspots are for operational policing, two metrics have been used, proposed by Bowers et al. (2004). These metrics are Hit rate (HR) and Area-to-Perimeter (AP) ratio. The HR measures the proportion of future crime accurately captured by the purported hotspots while AP ratio measures compactness (easiness of covering) the hotspots. Bowers et al. (2004) recommended that the two measures should be used together for meaningful evaluation. This is because hotspots with high HR may not necessarily be easily coverable based on their geometric shape. Thus, hotspots with moderate HR and a high AP ratio are preferred to ensure effective policing. However, certain limitations can be identified with the AP ratio which renders it less appealing for evaluating hotspots for effective policing. The AP ratio is used to measure the geometric complexities (compactness) of hotpots. The * monsuru.adepeju.11@ucl.ac.uk tao.cheng@ucl.ac.uk ‡ j.shawe-taylor@ucl.ac.uk βΈ kate.bowers@ucl.ac.uk † assumption is that regular-shaped hotspots (e.g. squares) can be covered quicker and more easily than irregularly-shaped hotspots, if we ignore the underlying network structure. The AP ratio has limitations. They are: a) Holding the shape of a hotspot constant, the AP ratio varies with the spatial scale, (ranging from zero to infinity). This makes it difficult to compare similar hotspots across different study areas. b) The level of disaggregation or dispersion of the hotspots (grid units) cannot be inferred from the value of the AP ratio. Therefore, the AP ratio cannot give us an idea of randomly distributed hotspots as a baseline for comparison. c) The AP ratio is relatively insensitive to differences in the structure of hotspots. Thus, although hotspots may possess very different shapes, they may have identical area and perimeters. Therefore, the goal of this paper is to propose a new metric for measuring effectiveness of crime hotspots for operational policing. Specifically, we are proposing a new metric called Clumpiness Index as an alternative to AP ratio given the limitations of AP ratio listed above. 2. Existing Metrics – Hit Rate and AP Ratio 2.1 Hit Rate: the proportion of new crimes captured by the defined hotspot. Evaluated at a certain area coverage (e.g. 20% area coverage) ∑π (ππ’ππππ ππ ππππππ ) π»ππ‘ π ππ‘π = ( ∑π=1 π (ππ’ππππ ππ ππππππ ) ) × 100 (1) π=1 Where i = number of ranked grids; k = number of percentile of ranked grids e.g. 20th; n = total number of grids. 2.2 Area-to-perimeter (AP) ratio: a measure of how compact an identified cluster (hotspot) is. The more compact a hotspot is, the easier and quicker it will be to cover operationally. Higher AP ratio corresponds to better compactness (Figure 1). AP ratio = 6/14 = 0.43 AP ratio = 6/10 = 0.6 Figure 1 Area-to-Perimeter (AP) Ratio. The hotspot on the right pane is more compact and therefore has higher AP ratio and may be seen as more efficient in operational policing terms. 3. A new metric - Clumpiness Index (CI) Provided a modest hit rate, the actual effectiveness of a predictive solution is measured in terms the geometric complexity (compactness) and distribution of the hotspots across a geographical area. We propose a new metric called “Clumpiness Index (CI)” which measures the compactness and distribution of hotspots and is robust to scaling issues of AP ratio. The Clumpiness Index (CI) was originally proposed by Turner (1989) as Contagion Index for measuring the overall clumpiness of categorical patches on a landscape. CI is able to measure effectively both patch type interspersion (i.e. the intermixing of units of different patch types) as well as patch dispersion (i.e. the spatial distribution of a patch type) at the landscape level. CI is computed by first summarising the adjacency of all cells in an adjacency matrix, which shows the frequency with which different pairs of patch types (including adjacencies between the same patch type) appear side-by-side on the map. CI is defined as follows: πΊπ −ππ πππ πΊπ = (∑π π=1 πππ 1−ππ πΊπ −ππ ) ; πΆπΌ = { 1−ππ ππ −πΊπ −ππ for πΊπ ≥ ππ (2) for πΊπ < ππ ; ππ ≥ 0.5 for πΊπ < ππ ; ππ < 0.5 gii = number of like adjacencies (joins) between pixels of patch type (class) i gik = number of adjacencies (joins) between pixels of patch types (classes) i and k Pi = proportion of the landscape occupied by patch type (class) i. The goal of CI is to determine the maximum value of gii for any Pi The CI takes values between -1 (when the class is maximally disaggregated) to 1 (when the class is maximally aggregated corresponding to a checkerboard arrangement). 4. Case Study 4.1 Camden Borough of London Camden Borough is one of the 12 inner boroughs of London City with an estimated 224,962 inhabitants as of 2011. The population density is estimated as approximately 10,000 people per square kilometre (2011 Census, Office of National Statistics). The Borough contains a mixture of commercial and residential areas with the busiest parts being the Camden Market and Covent Garden and Holborn. The borough recorded a crime rate of 145 crimes per 1,000 people in 2010/11, the national average being 75 crimes per 1,000 people (Source: Metropolitan Police Service website, 2014). 4.2 Effectiveness of hotspot for operational policing Three different crime types are used in this analysis. They are shoplifting, violence-against-persons and burglary (in-dwellings) crimes. The data points are aggregated to a grid system of 250m by 250m and have temporal resolution of 1 day. The time period predicted is between 28/09/2011 and 6/01/2012, predicting 2 days ahead using the crime risk surface produced on each day. To check the predictions against the real dataset, we overlay future crime on the predictive surface generated. For example, validating the prediction on day tn means overlaying crime data from day tn+1 to day tn+2 on the predictive surface generated on day tn. By so doing, the proportion of crimes that are captured by the ranked top 20% of the grids squares (hotspots) is evaluated (assuming that police only has resources to cover just 20% of the Camden). Two hotspots predictive methods are used, namely (i) Prospective Hotspot (Bowers et al. 2004) and (ii) Kernel Density Estimation (KDE) method. Figure 2 represents the average of percentage hit rate over the prediction period. Also included in Figure 2 is baseline prediction which is generated by way of picking grid squares with equal probability (random) until 20% coverage is attained. This is represented with the green line. The general performance of these predictive methods in terms of hit rates follows the spatial concentration of different crime types with highly spatially concentrated crime type showing highest hit rates. For example, shoplifting crime is highly concentrated in a few regions near commercial areas and therefore shows the highest hit rates whereas prediction of burglary crimes is lowest as residential properties are dispersed across the entire borough. Violence-Against-Persons Burglary-in-dwellings % Area Flagged % Area Flagged % Area Flagged % Crime Predicted Shoplifting Figure 2 Average % hit rates over the prediction period In measuring hotspot compactness, we adapted CI to crime hotspots by classifying grid squares constituting the predictive surface into two types, namely (1) Hot Spot – the top 20% ranked grids and (2) Cold Spot – the remaining 80%. Figure 3 shows examples of predictive surfaces generated by Prospective Hotspot method and random grid selection to illustrate how AP ratio and CI vary with different hotspot configurations. In each example, we consider two spatial scalings: the original data (unchanged), and data scaled such that each grid square has unit length. The AP ratio is observed to change at different scales of measurement of the same surface, making it difficult to compare. However, CI remains the same at any given scale. This is because CI is based on the adjacencies as well as the proportion of the hotspot grids across total surface. Therefore, CI is able to provide a sense of dispersion from a complete disaggregation (CI = -1) of the hotspot units. ´´´ ´ Predictive Surface 1 1 Predictive Surface (Random gridding) (Random gridding) Predictive Surface 1 (Random gridding) At aAt unit grid size: Predictive Surface 1grid: aa unit grid size: At unit AP ratio = 0.19 (Random gridding) AP ratio = 0.19 At a unit size: AP ratio =grid 0.19 CI =CI 0=0 AP ratio = 0.19 CI = -0.45 At a 0unit grid size: CI actual = At actual gridgrid measurement: At measurement: AP ratio = 0.19 At ratio actual grid measurement: AP ratio = 68.02 AP = 68.02 At = actual grid measurement: CI 0 CI =CI 0=0 AP ratio==68.02 68.02 AP ratio At CI =actual =0-0.45grid measurement: CI AP ratio = 68.02 CI = 0 Predictivesurface surface Predictive 22 (Prospective Hotspot) (Prospective Hotspot) Predictive surface 2 (Prospective Hotspot) Ata a unit grid size: At unit grid PredictiveAt surface 2 size: aratio unit grid: AP ratio = 0.37 AP 0.37 At ratio a unit==grid size: (Prospective Hotspot) AP CI = 0.420.37 CI 0.42 AP= ratio = 0.37 CICIAt ==0.32 a unit grid size: 0.42 actual grid measurement: AtAtactual grid measurement: AP ratio = 0.37 AP ratio =95.31 95.31 AP ratio =grid At actual AtCI actual gridmeasurement: measurement: =0.42 0.42 CI = CI 0.42 AP == 95.31 AP=ratio ratio 95.31 At actual grid measurement: 0.42 CICIAP ==0.32 ratio = 95.31 CI = 0.42 00 800 800 0 800 0 800 1,600 1,600 1,600 1,600 2,400 3,2003,200 2,400 Meters Meters 2,400 3,200 Meters 2,400 3,200 Meters Predictive surface 3 3 Predictive surface (Prospective Hotspot) (Prospective Hotspot) Predictive surface 3 Predictive surface 3 (Prospective Hotspot) At aAt unit grid size: (Prospective Hotspot) a unit grid size: AP ratio = 0.66 AP ratio = 0.66 At aa unit grid size: At unit At a unit grid size: CI =CI 0.88 = 0.88 grid: AP ratio = 0.66 AP AP ratio = 0.66 At actual gridgrid measurement: CI actual CI = 0.88 At measurement: CI == 164.58 0.54 AP ratio AP ratio = 164.58 At actual grid measurement: measurement: At CI =CI 0.88 = 0.88 At actual grid measurement: AP ratio = AP ratio = 164.58 164.58 CI = 0.88 AP ratio = 164.58 CI = 0.88 Legend Legend Legend Legend Legend Legend Legend Hot Spot 3 3Hot Legend Spot 3 3 Hot Spot Hot Spot Cold Spot Cold Spot Cold Spot Cold Spot AP = Area-to-Perimeter AP = Area-to-Perimeter CI= Clumpiness = Clumpiness Index CI Index AP AP= =Area-to-Perimeter Area-to-Perimeter CICI= =Clumpiness Index Clumpiness Index CI = 0.54 Figure 3 Evaluating hotspot compactness with AP ratio and CI 5. Discussion and Conclusion This study examines the use of existing metrics for measuring the effectiveness of predictive hotspots for operational policing. We highlighted some of the limitations of AP ratio, a metric that is specifically designed to evaluate effectiveness of predictive hotspots. We then proposed a new metric called Clumpiness Index CI which is able to eliminate the limitations of AP ratio. This study first established that the two predictive methods used (i.e. Prospective Hotspot and KDE) are able to predict crimes well above the baseline predictions (random), and their performances are observed to vary according to the spatial concentration of different crime types. The CI was then used to provide more interpretable assessment of hotspot compactness which is found to be very robust to change in scales of spatial units of analysis. In addition, the CI calculation provided a baseline of comparison i.e. either of maximally aggregated (CI = -1) or maximally disaggregated (CI = 1) hotspot configuration, giving a sense how easy the hotspots can be covered by the police. As with AP ratio however, the CI being a single-valued metric requires visualisation of the purported hotspot to make perfect meaning out of it. 6. Acknowledgements This research is part of the CPC (Crime, Policing and Citizenship) project supported by UK EPSRC (EP/J004197/1), in collaboration with the London Metropolitan Police. 7. Biography Monsuru Adepeju is currently 2nd year PhD student at the SpaceTimeLab for Big Data Analytics, at University College London. His PhD focuses on predictive modelling of space-time hotspot of crime and his research interests include validation of crime predictive models for predictive policing and development of usable predictive tools for operational environment. Tao Cheng is a Professor in GeoInformatics, and Director of SpaceTimeLab for Big Data Analytics (http://www.ucl.ac.uk/spacetimelab), at University College London. Her research interests span network complexity, Geocomputation, integrated spatio-temporal analytics and big data mining (modelling, prediction, clustering, visualisation and simulation), with applications in transport, crime, health, social media, and environmental monitoring. John S Shawe-Taylor is a professor at University College London (UK) where he is co-Director of the Centre for Computational Statistics and Machine Learning (CSML). His main research area is Statistical Learning Theory, but his contributions range from Neural Networks, to Machine Learning, to Graph Theory. He has coordinated a number of European wide projects investigating the theory and practice of Machine Learning, including the NeuroCOLT projects. Kate Bowers is a Professor in Crime Science at the UCL Department of Security and Crime Science. Kate has worked in the field of crime science for almost 20 years, with research interests focusing on the use of quantitative methods in crime analysis and crime prevention. References Bowers K J, Johnson S D and Pease K (2004). Prospective hot-spotting the future of crime mapping? British Journal of Criminology, 44, 641-658. Telep C W and Weisburd D (2014). Hot Spots and Place-Based Policing. In Encyclopedia of Criminology and Criminal Justice, Springer New York, pp. 2352-2363. Turner M G (1989). Landscape ecology: the effect of pattern on process. Annual Review of Ecology and Systems, 20: 171–197 Weisburd D (2008). Place-based policing, Ideas in American policing. Police Foundation, Washington, DC. http://www.policefoundation.org/content/place-based-policing. Accessed on 7/11/2014