BANOS Arnaud, BANOS Florence, 2003 : « Towards integrated spatial surveillance tools of urban risks : spacetime analysis of road accidents in Lille » , in Géographie des risques des transports, BANOS Arnaud, BANOS Florence, BROSSARD Thierry, LASSARRE Sylvain, Editeurs, Paradigme, Orléans, pp. 9-22 Towards integrated spatial surveillance tools of urban risks : space-time analysis of road accidents in Lille (France)1 Arnaud Banos, INRETS/DERA ; Florence Banos, Univeristé Paris IV Sorbonne arnaud.banos@inrets.fr, florence.banos@tiscali.fr Résumé L’accident de la circulation peut être interprété comme un événement ponctuel discret, localisé dans le temps et dans l’espace. Rare de par sa probabilité d’occurrence, il comporte par ailleurs une large part d’aléas dans son avènement. L’identification de configurations spatio-temporelles significatives, au sein d’une information aussi bruitée, représente donc un enjeu réel. Le point de vue défendu dans cet article est le suivant : une stratégie basée sur la coopération homme-machine, exploitant les capacités de chacune des parties, est à même de permettre des avancées intéressantes dans cette perspective de recherche de structures. A partir du semis de points des accidents survenus à Lille, entre 1993 et 1997, un exemple de mise en œuvre sera proposé. Animation de carte, exploration spatio-temporelle interactive et recherche automatique de structures seront combinées, au sein d’une stratégie de data-mining spatial interactif. Mots-Clés : Accidents de la route, Analyse spatiale exploratoire, Data-mining spatial, Lille, Semis de Points Abstract Road accidents may be seen as discrete punctual events, localized in both space and time. Called “rare” because of their low probability of occurrence, they are also largely subject to random fluctuations. Identification of significant space-time configurations, within such noisy information, is then a real stake. In this paper, we defend the idea that a strategy based on a 1 Most of the developments this article is based on were led in collaboration with Florence Banos (HugueninRichard), while the author was finishing his PhD thesis at ThéMA, UMR 6049 CNRS, Besançon. More information available at : http://thema.univ-fcomte.fr/banos/banos-page.htm Page 1 sur 14 BANOS Arnaud, BANOS Florence, 2003 : « Towards integrated spatial surveillance tools of urban risks : spacetime analysis of road accidents in Lille » , in Géographie des risques des transports, BANOS Arnaud, BANOS Florence, BROSSARD Thierry, LASSARRE Sylvain, Editeurs, Paradigme, Orléans, pp. 9-22 close human/computer cooperation, exploiting the peculiar capacities of each side, may be very useful. The point pattern of accidents that occurred in Lille, from 1993 to 1997, will be used to illustrate this point of view. Map animation, interactive spatio-temporal exploration and automated search for patterns will be combined, within an interactive spatial data-mining strategy. Key Words : Exploratory spatial analysis, Lille (France), Point pattern analysis, Road accidents, Spatial data-mining Introduction Spatial surveillance tools, aimed at revealing and detecting significant space-time patterns, may be of great value in urban planning and risk analysis. By allowing a detailed, accurate and maybe permanent following of urban palpitations, these tools would provide a strong and reliable base for urban management. In our opinion, such tools may be based on a strategy combining interactivity, animation and automation within a dynamic framework. This point of view we exposed in previous works (Banos, 1999, 2001a,b,c,d ; Banos and HugueninRichard, 2000) relies to three different traditions : exploratory spatial data analysis, geovisualisation and geocomputation. While the first one is firmly anchored in the statistic tradition with the original works of John W. Tukey (1977), the two others rely more fundamentally on computer science developments (Openshaw and Openshaw, 1997 ; Openshaw and Abrahart, 2000 ; Kraak, 1998 ; Kwan, 2000). The key idea of this paper is that we should take full advantage of the combination of these three fields, even if modalities and tools are still to be assessed. Examples of such developments are exposed, focused on road risk analysis, and developped mainly within the free statiscal programming environment Xlisp-Stat (Tierney, 1990). 1. Outline of the spatial data base The data set used for this synthesis comes from the territorial administration of Lille (France), kind provider of a very rich spatial data base : each accident is a localised event, known by its geographic co-ordinates ; each accident is described by qualitative and quantitative information (day, month, hour of occurrence, as well as seriousness of the event, users involved...). The Geographic Information System Arcview (ESRI) is used to store and to manage the data base. Page 2 sur 14 BANOS Arnaud, BANOS Florence, 2003 : « Towards integrated spatial surveillance tools of urban risks : spacetime analysis of road accidents in Lille » , in Géographie des risques des transports, BANOS Arnaud, BANOS Florence, BROSSARD Thierry, LASSARRE Sylvain, Editeurs, Paradigme, Orléans, pp. 9-22 On the pinpoints map hereafter, each point localises one or many traffic accidents, that may overlap each others. Figure 1 : Location of accidents in Lille, 1993-1997 From that point, different alternatives can be pursued : visual exploration through interactive graphs, space-time animation through smoothing algorithm and automated search for clusters 2. Space-time visual exploration…with a mouse The idea of interactivity is not new (Cleveland, 1993 ; Haslett and al., 1991 ; Monmonier, 1989) but retained increasing attention with the development and diffusion of interactive environments2. Dynamic linked graphics are thus as simple as powerful tools to investigate one's data set, particularly in a space-time perspective, by simply linking the spatial and temporal dimensions through appropriate graphs. We exploit here the dynamic graphic capacities of Xlisp-Stat, a statistical programming environment provided “for free” by Luke Tierney (Tierney, 1990). The spatial distribution of road accidents in Lille can be displayed through a bi-dimensional plot-points, while the temporal distribution – for a given time lag – can be displayed in a socalled “stacked plot”, enhanced with a kernel density function revealing the global trend. 2 See for example the study by Carolina Tobon, http//www.casa.ucl.ac.uk/paper31.pdf Page 3 sur 14 BANOS Arnaud, BANOS Florence, 2003 : « Towards integrated spatial surveillance tools of urban risks : spacetime analysis of road accidents in Lille » , in Géographie des risques des transports, BANOS Arnaud, BANOS Florence, BROSSARD Thierry, LASSARRE Sylvain, Editeurs, Paradigme, Orléans, pp. 9-22 Figure 2 : Interactive space-time exploration of accidents data within a dynamic environment The existence of a «hot link» between the two windows allows for interactive space-time analysis to be led, as every individual or group of individuals selected on one of the graphs (here the time plot) is instantaneously highlighted on the other graph. The user is then invited to investigate one dimension conditionally to the other, focusing on more local than global trends, the spatial or temporal scale of investigation being highly customisable. While this very simple strategy has proved successful in revealing meaningful local spacetime patterns, it cannot be considered as a stand-alone solution as what makes it strength makes also its weakness : While focusing on local trends, the user may miss global ones. Moreover, fully desegregated information, if desirable on many aspects, may not be sufficient and should lead to user driven aggregations ; Being active part of the investigative process through mouse manipulation, the user may miss patterns that are not expected at the moment ; The rough graphical display, that may be improved through real-time display of numeric indicators (Banos, 2001a), may not be sufficient to make distinctions between different patterns ; The combination of mouse manipulations that led to a given pattern may be hardly reproducible by the user itself. Page 4 sur 14 BANOS Arnaud, BANOS Florence, 2003 : « Towards integrated spatial surveillance tools of urban risks : spacetime analysis of road accidents in Lille » , in Géographie des risques des transports, BANOS Arnaud, BANOS Florence, BROSSARD Thierry, LASSARRE Sylvain, Editeurs, Paradigme, Orléans, pp. 9-22 For this reason, additional strategies may be imagined, based on both animation and automation processes. 3. Animating maps to reveal complexity It is often worth aggregating information, in order to provide additional insights. For example, pinpoints map – even interactive – hardly inform about the varying intensities of the underlying phenomenon, not only because points can overlap each other but also due to the limited capacity of human vision. Aggregating punctual information, such as the location of accidents, can be usefully realised using a kernel density algorithm (Bailey and Gatrell, 1995 ; Brunsdon, 1991 ; Silverman, 1986 ). A moving three-dimensional window of a chosen radius “r”, scans the studied area, counting the events “Xi” included in its circular area, and weighting them according to their distance to the centre “X” of the window : λˆ X n X X i 1 k i 1 r X i rX i 2 The function k(d), which is the kernel, may be defined in several ways. Here, a bi-square function is used : 3 2 2 1 d if d 1 k (d ) π 0 X X i with d r otherwise The densities estimated this way are finally interpolated3, producing smoothed surface in two or three dimensions. This computer intensive procedure allows for map animation to be produced, as figure 3 points out, providing valuable communication tools. The main message this kind of visual display conveys is indeed that we'd better learn to manage complexity, as we cannot always reduce it. 3 Most of the algorithms available provide good results, as the underlying grid is regular and complete. Page 5 sur 14 BANOS Arnaud, BANOS Florence, 2003 : « Towards integrated spatial surveillance tools of urban risks : spacetime analysis of road accidents in Lille » , in Géographie des risques des transports, BANOS Arnaud, BANOS Florence, BROSSARD Thierry, LASSARRE Sylvain, Editeurs, Paradigme, Orléans, pp. 9-22 Figure 3 : 3D map animation of observed insecurity in Lille (surface deformation is proportional to the density of accidents)4 Anyway, while being very useful to reveal space-time patterns, this smoothing method suffers from various defaults. The first one relates to the large volume of computation involved, that may reduce dramatically the level of interactivity of the process. Indeed, from an ideal point of view, the user should be able to cope dynamically with the spatial and temporal scales the animation is based on, so as to evaluate the influence of these major parameters on the visual rendering. Unfortunately, the amount of time needed to realise such an animation makes this goal unreachable, at least for the moment. A major default, especially when dealing with road risk analysis, can also be pointed out. Indeed, traffic accidents happen mainly on the road network (if we except those happening on parking) whether the kernel density algorithm operates on a surface basis (a circular window), what is more in an isotropic way for a given distance to the centroide of the window. Anisotropic smoothing algorithms, based on the network could be investigated, but we believe the surface rendering is essential for visual efficiency. More importantly, the densities estimated this way may be seen as indicators of unsafety rather than of risk, as they are not related to any exposure indicator. A relevant improvement would then be to weight each Page 6 sur 14 BANOS Arnaud, BANOS Florence, 2003 : « Towards integrated spatial surveillance tools of urban risks : spacetime analysis of road accidents in Lille » , in Géographie des risques des transports, BANOS Arnaud, BANOS Florence, BROSSARD Thierry, LASSARRE Sylvain, Editeurs, Paradigme, Orléans, pp. 9-22 localised density estimate by the local length of the network included in the window (figure 4). Figure 4 : Integration of the road network to enhance kernel density algorithm With the idea of feasibility testing, this GIS based procedure was applied to a small area, since it implies very large computations. Figure 5 : Application of the modified algorithm to a small study area The map obtained shows estimates of densities of accidents weighted by the density of road network for each moving window. This process provides rough but spatially differentiated estimates of risk of accidents in a given area. It shows the benefits we could obtain by integrating spatial analysis methods within a common framework : GIS should besides play this integration role, as much as they usually do in term of data management. 4 Visual rendering was obtained with MapInfo. Full animation fcomte.fr/banos/banos-page.htm can be viewed from http://thema.univ- Page 7 sur 14 BANOS Arnaud, BANOS Florence, 2003 : « Towards integrated spatial surveillance tools of urban risks : spacetime analysis of road accidents in Lille » , in Géographie des risques des transports, BANOS Arnaud, BANOS Florence, BROSSARD Thierry, LASSARRE Sylvain, Editeurs, Paradigme, Orléans, pp. 9-22 This «user oriented» approach may not be an end in itself. The interactive space-time analysis sketched out previously can indeed be enriched with relevant local indicators, obtained through more automated processes. 4. Automated identification of accidents clusters, in space and time While interactivity and visualisation are of crucial importance, spatial surveillance cannot be reduced to these tasks. Indeed, the burden of discovery can hardly be entirely left to what might be called a perfect user, never missing meaningful patterns nor being misled by incorrect knowledge. Some processes may then be automated, in the spirit of spatial datamining (Zeitouni and Yeh, 1999). Two strategies are introduced hereafter, allowing for clusters to be identified in a very precise way. Detection of spatial clusters in time As Stan Openshaw recalls, “the original idea of a geographical analysis machine was that of a something that could be left running permanently processing important databases in a never ending quest for patterns or relationships that matter” (Openshaw, 1995, p. 10). The method used below is a recent adaptation of the GAM proposed by Fotheringham and Zhan (1996). It compares locally the spatial pattern of a “case” population with its “at-risk” population (figure 6). Figure 6 : Defining significant clusters of child pedestrian accidents (right map) among the whole data set (left map) Page 8 sur 14 BANOS Arnaud, BANOS Florence, 2003 : « Towards integrated spatial surveillance tools of urban risks : spacetime analysis of road accidents in Lille » , in Géographie des risques des transports, BANOS Arnaud, BANOS Florence, BROSSARD Thierry, LASSARRE Sylvain, Editeurs, Paradigme, Orléans, pp. 9-22 Here, the spatial pattern of all accidents, minus the « case « ones, is supposed to form the « atrisk « population : “...the knowledge of the spatial distribution of the at-risk population allows more interesting clusters to be distinguished from those that arise purely from spatial variations in the density of the at-risk population” (Fotheringham and Zhan, 1996). The studied area is scanned with many randomly localised circular windows, and a statistical test is computed each time. In each window, the proportion of observed “case” events is compared to its theoretical proportion, which is the product of the number of the “at-risk” events and the mean proportion of “case” events in the whole area. The Poisson probability distribution is then used to assess the significance of the difference observed. The windows recording significantly high proportions of pedestrian children accidents, for a specified alpha-level, are mapped. Clusters, or relative concentrations of “case” accidents, are then pointed out by the overlapping of multiple circles. The modified version of the “GAM” allows here to detect significantly high proportions of child pedestrian accidents, for a given period and for a specified alpha-level (figure 7). Figure 7 : Clusters of child pedestrian accidents identified in Lille (1993-1997) by the modified GAM This clustering strategy proved successful in many occasions (Banos and Huguenin-Richard, 2000 ; Fotheringham and Zhan, 1996 ; Huguenin-Richard, 2000 ; Openshaw and al., 1987) Page 9 sur 14 BANOS Arnaud, BANOS Florence, 2003 : « Towards integrated spatial surveillance tools of urban risks : spacetime analysis of road accidents in Lille » , in Géographie des risques des transports, BANOS Arnaud, BANOS Florence, BROSSARD Thierry, LASSARRE Sylvain, Editeurs, Paradigme, Orléans, pp. 9-22 but remains fundamentally sequential in its spirit, being based on a loose interaction between the spatial and temporal dimensions. For that reason, we found useful to propose an alternate strategy, dealing more explicitly with space-time interactions Detection of space-time clusters The approach we propose here is based on now classical literature about space-time modeling (Bailey et Gatrell, 1995 ; Wakefield et al., 2000). The main idea is to extend the global indicator of Nathan Mantel (1967), so as to provide a local indicator of space-time clustering. Mantel’s indicator is expressed as the sum of the product of distances between events in both space and time. I Mantel xij yij i j with xij 1 1 and yij ( kd dij ) ( kt tij ) d ij is the euclidean distance, when t ij is the time interval between events. Using inverse functions allows to switch from distance to proximity, allowing for more intuitive interpretations of the values obtained. Finally, k d and k t are constants, allowing for events occuring at the same place and/or moment to be taken into account (mainly when k 1 ). The bias introduced by the combination of different metrics can be reduced by transforming each metric to a common interval [0, 100], using the following function : ( dis tan ceij Min( dis tan ceij ) Max( dis tan ceij ) Min( dis tan ceij ) 100 From that point, two last points have to be delt with. First, it is worth questionning whether events occuring in distant places are to be compared. Second, the question of the pragmatic use of this indicator may be adressed. Let us recall that we are looking for fairly detailed space-time clusters, rather than for global warnings. In this spirit, we proposed (Banos, 1999, 2001a) to estimate this indicator on a local base, using buffering process (Figure 8). Page 10 sur 14 BANOS Arnaud, BANOS Florence, 2003 : « Towards integrated spatial surveillance tools of urban risks : spacetime analysis of road accidents in Lille » , in Géographie des risques des transports, BANOS Arnaud, BANOS Florence, BROSSARD Thierry, LASSARRE Sylvain, Editeurs, Paradigme, Orléans, pp. 9-22 Figure 8 : A local adaptation of Mantel’s indicator of space-time clustering A local monte-carlo permutation test is then used to assess whether the values obtained may be seen as significant or could have occured “by chance”. Significant values, for a specified alpha-level, are finally mapped, allowing for space-time clusters to be easily detected (Figure 9). Figure 9 : Space-time clusters of road accidents in Lille (1993-1997) identified using buffers of size 50m (left) and 100m (right) and a 0.05 alpha-level Conclusion While being “in progress”, this work shows how spatial surveillance tools can be implemented within specific environments, dedicated to the management of urban risks. Space-time strategies combining exploratory spatial data analysis, geocomputation and geovisualisation based methods can indeed be imagined, leading to promising results. Page 11 sur 14 BANOS Arnaud, BANOS Florence, 2003 : « Towards integrated spatial surveillance tools of urban risks : spacetime analysis of road accidents in Lille » , in Géographie des risques des transports, BANOS Arnaud, BANOS Florence, BROSSARD Thierry, LASSARRE Sylvain, Editeurs, Paradigme, Orléans, pp. 9-22 Anyway, a major GIS shift has to be achieved if we want to carry on, from data management to data analysis : as Stan Openshaw recalled, “analysis is no longer an option” (Openshaw, 1995). In that spirit, local based methods and models are still to be defined, leading to more realistic and efficient identification of underlying patterns. While being of concern for the entire spatial analysis field, this challenge may sounds particularly appealing to the geograph, given his capacity to assume a major role in the process. References BAILEY T. and GATRELL A., 1995 : Interactive spatial data analysis, Longman Scientific & Technical, Harlow, 413 p. BANOS A. and HUGUENIN-RICHARD F., 2000 : “Spatial distribution of road accidents in the vicinity of point sources : application to child pedestrian accidents”, Geography and Medicine, Elsevier, pp. 54-64 BANOS A., 1999 : “Quelle implication de l’utilisateur dans une stratégie de data mining spatial ? Illustration à partir de l’appréhension spatio-temporelle des accidents de la route en milieu urbain”, Revue Internationale de Géomatique, Vol. 9, n° 4, pp. 441-456. BANOS A., 2001a : Le lieu, le moment, le mouvement : pour une exploration spatiotemporelle désagrégée de le demande de transport en commun en milieu urbain, Thèse de géographie, Besançon, 356 p. BANOS A., 2001b : “A propos de l’analyse exploratoire des données”, Cybergéo http://www.cybergeo.presse.fr, n° 197, 15 p. BANOS A., 2001c : “Enhancing mobility behaviour analysis using spatial interactive tools and computer intensive methods”, Geographic Information Sciences, Vol. 7, n° 1, pp. 35-41. BANOS A., 2001d : “Localizing people during surveys : a versatile strategy”, Actes de Colloque, Geocomputation, 6th International Conference on Geocomputation, Brisbane, 8 p. BRUNSDON C., 1991 : “Estimating probability surfaces in GIS : an adaptive technique”, EGIS’91 Proceedings, Second European Conference in GIS, Brussels, Vol. 1, pp. 155-164. BRUNSDON C., 1998 : “Exploratory spatial data analysis and local indicators of spatial association with Xlisp-Stat”, The Statistician, Vol. 47, n° 3, pp. 471-484. CAHEGAN M., 2000 : “Visualization as a tool for geocomputation”, Geocomputation, Taylor & Francis, London, pp . 253-274. Page 12 sur 14 BANOS Arnaud, BANOS Florence, 2003 : « Towards integrated spatial surveillance tools of urban risks : spacetime analysis of road accidents in Lille » , in Géographie des risques des transports, BANOS Arnaud, BANOS Florence, BROSSARD Thierry, LASSARRE Sylvain, Editeurs, Paradigme, Orléans, pp. 9-22 CLEVELAND, W., 1993 : Visualizing Data, Murray Hill, AT&T Bell Laboratories/Hobart Press. FOTHERINGHAM S. and ZHAN B., 1996 : “A comparison of three exploratory methods for cluster detection in spatial point patterns”, Geographical Analysis, Vol. 28, n° 3, pp. 200-218 HASLETT J., BRADLEY R., CRAIG P., UNWIN A., WILLS G., 1991 : “Dynamic graphics for exploring spatial data with application to locating global and local anomalies”, The American Statistician, Vol. 45, pp. 234-242 HUGUENIN-RICHARD F., 1999 : “Identifier les sites routiers dangereux : application de méthodes d’analyse utilisant la localisation géographique des accidents”, Revue Internationale de Géomatique, Vol. 9, n° 4, pp. 471-487. HUGUENIN-RICHARD F., 2000 : Approche géographique des accidents de la circulation : proposition de modes opératoires de diagnostic. Application au territoire de la métropole lilloise, Thèse de géographie, Besançon, 322 p. KRAAK M..J., 1998 : “The cartographic visualisation process : from presentation to exploration”, The Cartographic Journal, Vol. 35, n° 1, pp. 11-15 KWAN M., 2000 : “Interactive geovisualization of activity-travel patterns using threedimensional geographical information systems : a methodological exploration with a large data set”, Transportation Research C, Vol. 8, pp. 185-203 MANTEL, N., 1967: “The detection of disease clustering and a generalized regression approach”, Cancer Research, Vol. 27, pp. 209-220. MONMONIER M., 1989 : “Geographic brushing, enhancing exploratory analysis of the scatterplot matrix”, Geographical Analysis, Vol. 21, pp. 81-84 OPENSHAW S., 1995 : “Developping automated and smart spatial pattern exploration tools for geographical information systems applications”, The Statistician, Vol. 44, n° 1, pp. 3-16. OPENSHAW S. and ABRAHART R., 2000 : Geocomputation, Taylor & Francis, London, 413 p. OPENSHAW S., CHARLTON M., WYMER C. and CRAFT A., 1987 : “A mark I geographical analysis machine for the automated analysis of point data sets”, International Journal of Geographical Information Systems, Vol. I, n° 4, pp. 335-358. OPENSHAW S. and OPENSHAW C., 1997 : Artificial intelligence in geography, John Wiley & Sons, Chichester, 329 p. SILVERMAN B., 1986 : Density estimation for statistics and data analysis, Chapman and Hall, London Page 13 sur 14 BANOS Arnaud, BANOS Florence, 2003 : « Towards integrated spatial surveillance tools of urban risks : spacetime analysis of road accidents in Lille » , in Géographie des risques des transports, BANOS Arnaud, BANOS Florence, BROSSARD Thierry, LASSARRE Sylvain, Editeurs, Paradigme, Orléans, pp. 9-22 TIERNEY, L., 1990 : Lisp-Stat : an object-oriented environment for statistical computing and dynamic graphics, John Wiley & Sons, New-York, http://www.stat.umn.edu/ TUKEY J.-W., 1977 : Exploratory data analysis, Addison-Wesley, Reading, Massachusetts, 688 p. WAKEFIELD J., KELSALL J. and MORRIS S., 2000 : “Clustering, cluster detection and spatial variation in risk”, Spatial epidemiology, methods and applications, pp. 128-152. ZEITOUNI K. and YEH L., 1999 : “Le data mining spatial et les bases de données spatiales”, Revue Internationale de Géomatique, Vol. 9, n° 4, pp. 389-423. Page 14 sur 14