Network Complexity and Spatio-Temporal Data Mining (STDM) Dr Tao Cheng + STANDARD team {tao.cheng@ucl.ac.uk} Senior Lecturer in GeoInformatics Department of Civil, Environmental and Geomatic Engineering (CEGE) University College London Outline • • • • Nature of Network complexity Its challenges for STDM Case studies from the STANDARD project Future directions for NC and STDM Challenges - Network Complexity 1) Heterogeneity (structure & performance) - nonlinearlity - nonstationarity (MAUP problem in GIS) Great progress in describing structure (e.g. power-laws) of ‘what is’, but how to model and predict nonlinear and nonstationary performance? Challenges - Network Complexity 2) Dynamics - changes in physical structure (nodes & links) - implications for supply/capacity changes - changes in movement patterns on the network (density/ flow/speed; behaviour) - leads to changes in demand Much progress in modelling supply - demand interactions at the macroscopic level, but - lack of clarity about implications for individual behaviours and their collective effects; - No readily available tools to demonstrate or capture the transition from free flow to congestion Challenges - Network Complexity 3) Interactions & Associations - spatial (upstream/downstream) - temporal (past/present/future) - spatio-temporal - multiple factors (incidents, weather, big events,..) - multiple networks We accommodate spatial or temporal associations (autocorrelations), but - Fail to integrate treatment of spatio-temporal autocorrelation simultaneously - Failure to consider multiple networks Research Frontiers in Network Complexity 1) Forecasting and prediction - nonlinearlity & nonstationarity 2) Tools to capture/illustrate the processes - Emergence and tipping points - Simulating behaviour (macroscopic properties alter because of accumulated microscopic changes) 3) Spatio-temporal dependence and interactions - impact of activities on the network - interactions between networks BigData – empirical theory and testing STANDARD – Spatio-Temporal Analysis of Network Data and Route Dynamics understand traffic congestions in space-time • Short-term and long-term journey time prediction – STARIMA; ANN; Kernel-based approach • Early detection of traffic congestion – clustering: STC; STSS • Interactive visualization of journey time reliability and traffic congestion – 2D (hotspot); 3D(wall-map; isosurface) • Simulation of non-recurrent congestion – Agent-based simulation • Intervention Analysis (weather, tube strike, road works) – regression Space-time prediction & forecasting The challenge lies in the non-stationary (heterogeneity) and non-linearity of space-time data. Statistical Approaches • STARIMA models • space-time geostatistical models • spatial panel data models • space-time GWR How to calibrate the spatiotemporal autocorrelations is the bottleneck. Machine Learning Approaches • artificial neural networks (ANNs) • self-organized maps • Genetic algorithms • support vector machines (SVMs) • Kernel-based approach The interpretability of machine learning is low James Haworth Real &me traffic forecas&ng 9 James Haworth & Jaiqiu Wang: Space-Time Modelling and Prediction Results – Root mean squared error (seconds/kilometre) Interval Naïve ARIMA STARIMA LSTARIMA 5 minutes 49.4 47.4 55.9 46 15 minutes 74.7 68.7 89.1 67.3 30 minutes 93.2 82.1 109 80 10 Space-time clustering To extract meaningful patterns (clusters) • To detect outliers or emerging phenomena (epidemic outbreaks or traffic congestion) • Considering the spatial, temporal and thematic attributes seamlessly and simultaneously, and the dynamicity in the data is the most difficult challenge in spatio-temporal clustering • Spatio-temporal scan statistics (STSS) sheds lights on this aspect • Efforts are needed to improve computation efficiency and to reduce the false alarm rate of STSS Berk Anbaroglu - STSS for early detection of non-recurrent traffic congestion Clusters of Congestion 25 May 2010 – State Opening of Parliament Space-time visualisation Explores the patterns hidden in the large data sets • using advanced (analytical) visualization and animation – static 2D maps – 3D wall maps and isosurface (hotspots in space-time) • Tools: “Visual Analytics” and “Geovisual Analytics” • Still, real-time visualization of dynamic processes is still very challenging due to large volume and high dimensions of the data. • Methods are needed to show evolution and dissipation in space and time simultaneously (e.g. crime or traffic congestion) Space-Time Visualisation: data -> process, story traffic congestion in space-time (1) Cheng, Emmonds, Tanaksaranond, Sonoiki (2010), Multi-Scale Visualisation of Inbound and Outbound Traffic Delays in London, The Cartographic Journal, 47: 323–329. Visualization of traffic congestion in space-time (2) 3D Wall maps of inbound roads on 6th – 7th September 2010 Isosurface Top view Side view Garavig Tanaksaranond – Space-Time Visualisation of Traffic Congestion Visualising Congestion Build-up in London 3D Wall Map Travel Time Interactive Visualization Tool Space-­‐Time Mul&-­‐Agent Simula&on • How do drivers react when faced with road closure? • Depends on the urban environment, individual knowledge of the network and conditions, and behaviour of others • Behaviour of individuals (microscopic behaviour) influences the formation and movement of congestion (macroscopic phenomena) (Manley & Cheng, 2010) SPREAD OF CONGESTION • Understanding formation of congestion through the behaviour of individual drivers Saturation Ed Manley – Agent-based Simulation 0 – 0.2 0.2 – 0.4 0.4 – 0.5 Regent’s Park 0.5 – 0.6 0.6 – 0.7 0.7 – 0.8 0.8 – 0.9 0.9 – 1.0 1.0 – 1.2 1.2 – 1.5 > 1.5 Hyde Park Adel Bolbol Fernandez - Understanding Travel Behaviours from GPS Data Logs LocaHon InformaHon GPS Machine Learning Mode of Transport & Stops GPS Tes=ng data: 110 par&cipants, 2 Months/ par&cipant , 20 second collec&on rate All par&cipants based in Greater London h"p://www.homepages.ucl.ac.uk/~ucesadb/video.html Future Directions of STDM/NC (1) • New methods and theory are needed for mining crowd sources that contributed by citizens and volunteers including social media data – often extremely noisy, biased, and nonstationary, e.g. trajectory data – Method needed to combine text mining with STDM – This area is relevant to the recent development of citizen sciences and VGI in particular. • Theory and methods need to be developed to extract meaningful patterns from those individual sensors and put them under the framework of networks and network complexity such as transport and social-networks made up of those individual. • Under network, the interaction and dynamic flows should be considered in mining spatio-temporal patterns. • This aspect is relevant to the complexity theory and network dynamics in particular. Future Directions (cont.) • STDM for emergency and tipping points, i.e. how to generate actionable knowledge, i.e. finding the emergent patterns and tipping points of economics and epidemics? • It is important to find outliers, but more important is finding the critical points before the system breaks down so that mitigating action can be taken to avoid the worst scenarios such as traffic congestion and epidemic transmission. • Another challenge of STDM is how to calibrate, explain and validate the knowledge extracted. • A good example of this is the calibration of spatial (or spatio-temporal) autocorrelation. Higher order spatial autocorrelation models have been developed, but the pitfalls have also been found (LeSage and Pace 2011). • This makes machining learning more promising in future STDM. Future Directions (cont.) • grid computation and cloud computation – Key for scaling the algorithm to large network • Open sources (data + software + algorithms) • Online computation • Real-time computation • More systematic applications – CPC • … Acknowledgements hKp://standard.cege.ucl.ac.uk + Dr Andy Chow + Colleagues in TfL