Network Complexity and Spatio-Temporal Data Mining (STDM)

advertisement
Network Complexity and
Spatio-Temporal Data Mining (STDM)
Dr Tao Cheng + STANDARD team
{tao.cheng@ucl.ac.uk}
Senior Lecturer in GeoInformatics
Department of Civil, Environmental and Geomatic Engineering (CEGE)
University College London
Outline
• 
• 
• 
• 
Nature of Network complexity
Its challenges for STDM
Case studies from the STANDARD project
Future directions for NC and STDM
Challenges - Network Complexity
1) Heterogeneity (structure & performance)
- nonlinearlity
- nonstationarity (MAUP problem in GIS)
Great progress in describing structure (e.g. power-laws) of
‘what is’, but
how to model and predict nonlinear and nonstationary
performance?
Challenges - Network Complexity
2) Dynamics
- changes in physical structure (nodes & links)
- implications for supply/capacity changes
- changes in movement patterns on the network (density/
flow/speed; behaviour)
- leads to changes in demand
Much progress in modelling supply - demand interactions at the
macroscopic level, but
- lack of clarity about implications for individual behaviours
and their collective effects;
- No readily available tools to demonstrate or capture the
transition from free flow to congestion
Challenges - Network Complexity
3) Interactions & Associations
- spatial (upstream/downstream)
- temporal (past/present/future)
- spatio-temporal
- multiple factors (incidents, weather, big events,..)
- multiple networks
We accommodate spatial or temporal associations
(autocorrelations), but
-  Fail to integrate treatment of spatio-temporal
autocorrelation simultaneously
-  Failure to consider multiple networks
Research Frontiers in Network Complexity
1)
Forecasting and prediction
- nonlinearlity & nonstationarity
2) Tools to capture/illustrate the processes
- Emergence and tipping points
- Simulating behaviour (macroscopic properties alter
because of accumulated microscopic changes)
3) Spatio-temporal dependence and interactions
- impact of activities on the network
- interactions between networks
BigData – empirical theory and testing
STANDARD – Spatio-Temporal Analysis of Network Data and Route Dynamics
understand traffic congestions in space-time
•  Short-term and long-term journey time prediction
–  STARIMA; ANN; Kernel-based approach
•  Early detection of traffic congestion
–  clustering: STC; STSS
•  Interactive visualization of journey time reliability and
traffic congestion
–  2D (hotspot); 3D(wall-map; isosurface)
•  Simulation of non-recurrent congestion
–  Agent-based simulation
•  Intervention Analysis (weather, tube strike, road works)
–  regression
Space-time prediction & forecasting
The challenge lies in the non-stationary (heterogeneity) and non-linearity of
space-time data.
Statistical Approaches
•  STARIMA models
•  space-time geostatistical
models
•  spatial panel data models
•  space-time GWR
How to calibrate the spatiotemporal autocorrelations is the
bottleneck.
Machine Learning Approaches
•  artificial neural networks
(ANNs)
•  self-organized maps
•  Genetic algorithms
•  support vector machines
(SVMs)
•  Kernel-based approach
The interpretability of machine
learning is low
James Haworth Real &me traffic forecas&ng 9 James Haworth & Jaiqiu Wang: Space-Time Modelling and Prediction Results – Root mean squared error (seconds/kilometre) Interval Naïve ARIMA STARIMA LSTARIMA 5 minutes 49.4 47.4 55.9 46 15 minutes 74.7 68.7 89.1 67.3 30 minutes 93.2 82.1 109 80 10 Space-time clustering
To extract meaningful patterns (clusters)
•  To detect outliers or emerging phenomena (epidemic
outbreaks or traffic congestion)
•  Considering the spatial, temporal and thematic attributes
seamlessly and simultaneously, and the dynamicity in the
data is the most difficult challenge in spatio-temporal
clustering
•  Spatio-temporal scan statistics (STSS) sheds lights on this
aspect
•  Efforts are needed to improve computation efficiency and
to reduce the false alarm rate of STSS
Berk Anbaroglu - STSS for early detection of non-recurrent traffic congestion
Clusters of Congestion
25 May 2010 – State Opening of Parliament
Space-time visualisation
Explores the patterns hidden in the large data sets
•  using advanced (analytical) visualization and animation
–  static 2D maps
–  3D wall maps and isosurface (hotspots in space-time)
•  Tools: “Visual Analytics” and “Geovisual Analytics”
•  Still, real-time visualization of dynamic processes is still very
challenging due to large volume and high dimensions of the data.
•  Methods are needed to show evolution and dissipation in space
and time simultaneously (e.g. crime or traffic congestion)
Space-Time Visualisation: data -> process, story
traffic congestion in space-time (1)
Cheng, Emmonds, Tanaksaranond, Sonoiki (2010), Multi-Scale Visualisation of Inbound and Outbound
Traffic Delays in London, The Cartographic Journal, 47: 323–329.
Visualization of traffic congestion in space-time (2)
3D Wall maps of inbound roads on 6th – 7th September 2010
Isosurface
Top view
Side view
Garavig Tanaksaranond – Space-Time Visualisation of Traffic Congestion
Visualising Congestion Build-up in London
3D Wall Map Travel Time
Interactive Visualization Tool
Space-­‐Time Mul&-­‐Agent Simula&on •  How do drivers react when faced with road
closure?
•  Depends on the urban environment,
individual knowledge of the network
and conditions, and behaviour of
others
•  Behaviour of individuals (microscopic
behaviour) influences the formation and
movement of congestion (macroscopic
phenomena)
(Manley & Cheng, 2010)
SPREAD OF CONGESTION •  Understanding formation of congestion
through the behaviour of individual drivers
Saturation
Ed Manley – Agent-based Simulation
0 – 0.2
0.2 – 0.4
0.4 – 0.5
Regent’s
Park
0.5 – 0.6
0.6 – 0.7
0.7 – 0.8
0.8 – 0.9
0.9 – 1.0
1.0 – 1.2
1.2 – 1.5
> 1.5
Hyde Park
Adel Bolbol Fernandez - Understanding Travel Behaviours from GPS Data Logs
LocaHon InformaHon GPS Machine Learning Mode of Transport & Stops GPS Tes=ng data: 110 par&cipants, 2 Months/ par&cipant , 20 second collec&on rate All par&cipants based in Greater London h"p://www.homepages.ucl.ac.uk/~ucesadb/video.html Future Directions of STDM/NC (1)
•  New methods and theory are needed for mining crowd sources that
contributed by citizens and volunteers including social media data
–  often extremely noisy, biased, and nonstationary, e.g. trajectory data
–  Method needed to combine text mining with STDM
–  This area is relevant to the recent development of citizen sciences and
VGI in particular.
•  Theory and methods need to be developed to extract meaningful
patterns from those individual sensors and put them under the
framework of networks and network complexity such as transport and
social-networks made up of those individual.
•  Under network, the interaction and dynamic flows should be
considered in mining spatio-temporal patterns.
•  This aspect is relevant to the complexity theory and network dynamics
in particular.
Future Directions (cont.)
•  STDM for emergency and tipping points, i.e. how to generate actionable
knowledge, i.e. finding the emergent patterns and tipping points of
economics and epidemics?
•  It is important to find outliers, but more important is finding the critical
points before the system breaks down so that mitigating action can be
taken to avoid the worst scenarios such as traffic congestion and
epidemic transmission.
•  Another challenge of STDM is how to calibrate, explain and validate
the knowledge extracted.
•  A good example of this is the calibration of spatial (or spatio-temporal)
autocorrelation. Higher order spatial autocorrelation models have been
developed, but the pitfalls have also been found (LeSage and Pace
2011).
•  This makes machining learning more promising in future STDM.
Future Directions (cont.)
•  grid computation and cloud computation
–  Key for scaling the algorithm to large network
•  Open sources (data + software + algorithms)
•  Online computation
•  Real-time computation
•  More systematic applications
–  CPC
•  …
Acknowledgements hKp://standard.cege.ucl.ac.uk + Dr Andy Chow + Colleagues in TfL 
Download