Group Behaviour Analysis of London Foot Patrol Police Jianan Shen, Tao Cheng SpaceTimeLab for Big Data Analytics, Department of Civil, Environmental and Geomatic Engineering, University College London, Gower Street, London WC1E 6 BT, UK Nov 03, 2014 Summary The main objective of this research is to propose a method for group movement pattern generalisation and classification. To this end, DBSCAN is used on stay points for POI identification. Then, movement features are extracted and selected for the behaviour classification. A kernel-based Support Vector Machine method is developed to infer the working types of the officers based on the selected features depicting their movement histories. By analysing the geo-tagged police data, we demonstrate how this method can be used to reveal user information, especially interest information based on their POIs and spatial-temporal movement patterns. KEYWORDS: Policing, Travel Pattern, Machine Learning, SVM Classification, POI 1. Introduction The more and more ubiquitously used GPS-integrated devices have generated increasingly large movement history data than ever before. Such movement history data have enabled researchers to analyse and visualise movement trajectories, home range and POI (Points of Interests) distribution, and mobility patterns. Police foot patrol activity, as a special kind of human movement phenomenon, can also be analysed in similar methods with certain modifications. In this research, we intend to present a framework of patrol pattern analysis and classification of police working type, which is capable of 1) extracting police POIs and traveling sequences from GPS trajectory data; 2) selecting and summarising individual and group movement features for description and comparison of officers undertaking different works; 3) SVM classification of different types of officers based on the selected moving history features. 2. Case Study and Data The case study takes place in the Camden Borough of London(Figure 1). Five major police stations are located (Places No. 1, 7, 8, 9 and 11 in Figure 1) in this region, namely, West Hampstead, Hampstead, Kentish Town, Albany Street and Holborn. Automatic Personnel Location Systems (APLS) data set provided by the Metropolitan Police recorded officers’ location stamps collected by GPS-integrated radio sets portable on every officer in patrol. The length of the data is 84 days. The 241525 records in the dataset recorded information such as call signs, device IDs, as well as the locations and times of all 745 officers active in that period. Usually, the sampling rate of the system is one update every 10 minutes. Figure 1 The POIs identified in Camden, including 10 non-police-station POIs (1 of them locates outside Camden) and 5 police stations. 3. Methodology and Results 3.1. POIs and Traveling Sequences Extraction In discovering POIs, the location and time of officers' stopping behaviour is of more interests than the moving factors (Palma, 2008). To this end, DBSCAN is used for the clustering of stay points where officers stop or move slowly for more than 20 minutes. The massive activities and stops observed nearby police stations are common reflection of police daily routine and office works, and hence disturbed the searching for the POIs during the patrol activities. Therefore, outliners and records generated within 400m radius from the police stations are removed before the density based clustering. Only trajectory moving out of the radius will be considered as a start of a journey and when the time interval between two records exceed 2 hours, the series will be separated as two trips. By adjusting the parameters according to the quality of the data set and the method suggested by Zhou (2012), 10 non-police-station POIs are discovered. For instance, POI No. 2 represents a tube station, POI No.3 is considered to be a major road intersection. Figure 2 Individual movement history expressed by sequences of visited POIs By identifying building information on Ordinance Survey Maps and communicating with the police. The main topics of these POIs are identified as schools, places of worships, tube stations and major road intersections. The movement histories of the officers can then be simplified and expressed by a sequence of common POIs the officers visited and the time they arrive and leave each POI attached in the sequence (Figure 2). Similarly, it can also be expressed as a series of topics an officer went through during his/her patrol journey. For example, the journey of Officer 2 in Figure 2 can be summarised as "Commercial Area> School > Fire Station" and the attached time information about how long s/he stayed in each POI. The information of time series, POI topics and trajectories will be further processed to support the classification works. 3.2 Individual and Group Movement Features After acquiring the semantic meaning of discovered POIs in previous step, individual and group movement can be depicted and summarised with 14 features that are selected or extracted from the raw GPS log as well as the summarised POI visiting history. The features include: mean and standard deviation of speed, mean and standard deviation of patrol covering area, proportion of stopping time, proportion of time spent in police stations and POIs of different topic types, proportion of time spent in weekends and nights and the activeness expressed by the number of records generated by the officer in unit time and so on. Officer Group A Officer Group C Figure 3 The comparison of POI visiting frequency between two groups consisting of two different types of officers in one month Figure 4 shows 10 example features of the 3 groups. The values of features are z-score normalised for further classification works. It can seen in the chart that the selected features of the 3 groups vary greatly. For example Group B has strong daytime activeness proportion and seldom work at night while the standard deviation of Group A is much larger than other groups. Activeness 1 Time in commercial… 0.5 Time on major roads Area covered SD of speed 0 -0.5 -1 Average trip distance Daytime ratio In-station ratio Weekday ratio Stop-time ratio Group A Group B Group C Figure 4 Radar chart comparison of 10 example features depicting the patrol behaviour of different groups 3.3. SVM Classification of Officer works using RBF Kernel Many researches proposed SVM and other machine learning methods analysing online user behaviors. Some researchers developed similar methods on the analysis of travelling behaviour in real life, such as Baraglia (2013). By using the parameter selection method developed by Schölkopf (2002), RBF kernel is used in the SVM classification of officer working types based on the individual movement features extracted in previous section. The model is trained with officer movement data of which the working types have already been labeled and is used to classify the data generated in 10 days of the next month. The accuracies of this attempt are logged in Table 1. Table 1 Classification accuracy by training sets of different sizes Size of training sets 10 days 20 days 30 days Training Error 13.52% 15.00% 14.83% Accuracy of classification 68.89% 72.34% 73.42% This classification is based on manually-selected and unoptimised feature sets. This method will be further improved with optimum feature selection and using more advanced kernels. 4. Summary and future work In this research, a method for identifying user (officer) type based on travel (patrol) history is proposed. The framework include density-based POI clustering, feature selection and validation, as well as machine learning classification. The Camden APLS data enabled the study that others cannot proceed with due to the lack of modern GPS-enabled policing equipments. The method revealed the movement features of different officers in space and in time. It can also be used in other time series geo-tagged data for automatic movement pattern generalisation, traveler interest and routine mining, similarity analysis and so on. Further works may include improving the performance of SVM in unbalanced data sets (the officer numbers in different types vary greatly) and comparing SVM with other methods such as KNN and Artificial Neural Networks. 5. Acknowledgements This work is part of the project - Crime, Policing and Citizenship (CPC): Space-Time Interactions of Dynamic Networks (www.ucl.ac.uk/cpc), supported by the UK Engineering and Physical Sciences Research Council (EP/J004197/1). The data provided by Metropolitan Police Service (London) is highly appreciated. 6. Biography Jianan Shen is a PhD student in UCL SpaceTimeLab. He studied Information Engineering in NUDT in China (BEng, 2013). He is now working on the Crime, Policing and Citizenship Project sponsored by the EPSRC, UK. His research interests include movement pattern and trajectory analysis with Machine Learning approaches. Tao Cheng is a Professor in GeoInformatics, and Director of SpceTimeLab for Big Data Analytics (http://www.ucl.ac.uk/spacetimelab), at University College London. Her research interests span network complexity, Geocomputation, integrated spatio-temporal analytics and big data mining (modelling, prediction, clustering, visualisation and simulation), with applications in transport, crime, health, social media, and environmental monitoring. References Baraglia R, Muntean C I, Nardini F M and Silvestri F (2013). LearNext: learning to predict tourists movements. In Proceedings of CIKM, 751-756. Palma A (2008). A Clustering based Approach for Discovering Interesting Places in Trajectories. Symposium on Applied Computing, March 16-20, Fortaleza, Brazil. 863-868. Schölkopf B and Smola A J (2002). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press. Zhou C (2004). Discovering Personal Gazetteers: An Interactive Clustering Approach. Geographic Information Science, 266-273.