Activity-based Modelling: An overview (and some things we have been doing to advance state-of-the-art) E. Zwerts (With the cooperation of E. Moons and D.Janssens) Transportation Research Institute Data Analysis and Modelling Group, Faculty of Applied Economic Sciences, Limburgs Universitair Centrum, Diepenbeek, Belgium, E-mail: enid.zwerts@luc.ac.be Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium Outline • • • • • Why transportation modelling? Which kinds of transportation modelling? Why activity-based transportation modelling? Which activity-based transportation model? Model Selection: Albatross – What is Albatross? – Things what we have been doing and are still going to do with respect to Albatross • Introduction of an alternative modelling approach based on sequential dependencies in data (short version) Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium Why transportation modelling? • Transportation problem is multi-dimensional: – – – – Traffic jams CO2-emissions Impact on economy Traffic accidents with significant number of casualties in Belgium • The need for transportation infrastructure is high, due to: – Globalization – Urbanization – Governments cannot afford transportation constraints to have a negative impact on future competiteveness, foreign investments,… • However, changing the existing infrastructure is: – Expensive, have significant long-term effects – No guarantee for succes – Not trivial (existing spatial zones, restricted by local and federal regulations, legislation, etc.) Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium Why transportation modelling? • Therefore transportation models are often used. They can: – Support management decision making – Make predictions in uncertain circumstances: • • • • Changing infrastructure, environment Changing behaviour of people Changing socio-demographic circumstances ... • The aim for these models is to portray reality as accurate as possible • They are frequently used in different countries Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium Transportation modelling: Trip-based Approach Reality • Modelling as independent and isolated trips, no connections between the different trips • no time component • no direction • no sequential infomation Play Squash 12h, By foot Work 7.30h,PT 22h, Car Family visit 12.50h, By foot 16.40h,PT At home 19h, Car Trip-based model Work PT, 2X At home Work By foot, 2X Squash At Family Car, 2X home visit Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium Transportation modelling: Tour-based Approach • Trips that start and end from home or from the same worklocation are modelled independent Reality Play Squash 12h, By foot Work 7.30h,PT • Direction + (spatial) limitations • No temporal dimension • Independent tours, model is not capable of making the integration 12.50h, By foot 16.40h,PT At home 22h, Car 19h, Car Family visit Tour-based model Play Squash By foot By foot • Uses Nested logit techniques Work PT PT At home At home Car Car Family Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, visit Belgium Transport modelling: An activitybased approach • Travel demand is derived from the activities that individuals need/wish to perform • Sequences or patterns of behaviour, and not individual trips are the unit of analysis • Household and other social structures influence travel and activity behaviour • Spatial, temporal, transportation and interpersonal interdependencies constrain activity/travel behaviour • Activity-based approaches reflect the scheduling of activities in time and space. Activity-based approaches aim at predicting which activities are conducted where, when, for how long, with whom, the transport mode involved and ideally also the implied route decisions. Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium Which Activity-based transportation model? • Utility maximizing models • Sequential models (computational process models) Model Activity classes Destination Timing Duration With whom Mode ALBATROSS p(9) p(23) minutes p p(3) p(5) AMOS g(4) n minutes minutes n n DAILY ACTIVITY SCHEDULE p(4) n sequence n n p(3) g p(4) sequence g n n p(3) p(9) n n n p(6) SCHEDULER p p minutes g n n STARCHILD g(6) p(110) sequence n n p NO NAME (Wen&Koppel man) g(4) n sequence n n p(1) GISICAS PETRA p=predicted by the model; n=not treated in model; g=assumed given in model Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium ALBATROSS • Albatross: A learning based transportation oriented simulation system = activity-based model of activity-travel behavior, derived from theories of choice heuristics • Developped in the Netherlands (Arentze, Timmermans ;2000) • The model predicts which activities are conducted when, where, for how long, with whom and also transport mode • Decision tree is proposed as a formalism to model the heuristic choice Obviously, this is a crucial component of the model. The better the learning algorithm, the better the prediction… Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium Constraints that have been taken into account in Albatross • Situational constraints: can’t be in two places at the same time • Institutional constraints: such as opening hours • Household constraints: such as bringing children to school • Spatial constraints: e.g. particular activities cannot be performed at particular locations • Time constraints: activities require some minimum duration • Spatial-temporal: constraints an individual cannot be at a particular location at the right time to conduct a particular activity Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium Modelling Choice behavior • Models used to rely on utility-maximization • Albatross assumes that choice behavior is based on rules that are formed and continuously adapted through learning while the individual is interacting with the environment (reinforcement learning) or communicating with others (social learning). As said, rules are currently derived from decision trees Other rule-based learning algorithms can also be used Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium The scheduling model Aim: Determine the schedule (=agenda) of activity-travel behaviour Components: 1. a model of the sequential decision making process 2. models to compute dynamic constraints on choice options 3. a set of decision trees representing choice behavior of individuals related to each step in the process model ] a-priori defined ] derived from observed choice behavior Skeleton refers to the fixed and given part of the schedule Flexible activities: optional activities added on the skeleton Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium The sequential decision process (process model) Each oval represents a DT Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium Example Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium The inference system in Albatross • For each decision, the model evaluates dynamic constraints • The implementation of situational, household and temporal constraints is straightforward • We will look at space–time constraints and choice heuristics determining location choices Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium Albatross derives DT based on Chaid-learning algorithm Use a probabilistic assignment rule. The probability of selecting the q-th response for each new case assigned to the k-th node is: where fkq is the number of training cases of category q at leaf node k and Nk the total number of training cases at that node Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium Testing the model Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium Results of inducing decision trees Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium Branch of time-of-day tree Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium Performance of Albatross • The eventual goodness-of-fit of the model can be assessed only by a comparison at the level of complete activity patterns • Eventual output of Albatross is OD- trip matrices • Conclusions till here: – Use of decision trees for choice heuristics, resulting in a considerable, but varying improvement over a null model – A sample size of 2000 household-days suffices to develop a stable model – Transferability of the model to another context than in which it was developed remains to be studied Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium Advance the state of the art Some things what we have been doing in our research group with respect to Albatross: • Two other rule-based techniques applied in the context of the Albatross model: – Integrate Decision tree techniques and feature selection: Identify irrelevant attributes and build simple models – Build advanced complex models by means of Bayesian networks and try to improve accuracy • Use (and adapt) Albatross towards the application area of Flanders • Evaluate the performance of activity-based models versus trip-based models Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium Application 1: Build simple models by means of DT and feature selection • General idea: Occams’ razor: “Entities are not to be multiplied beyond necessity” Large set of attributes - likely to be correlated - larger trees, but not necessary better ! Use feature selection techniques to identify irrelevant attributes that do not significantly improve accuracy and can thus be omitted in the final model Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium Application 1: Empirical results • Build a DT for every decision facet in the Albatross model • Example: “location”-facet Accuracy 46 44 42 40 38 36 34 32 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 20 25 28 Number of Attributes Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium Application 1: Empirical results Full approach FS approach Decision tree # attrs # leafs e Decision tree # attrs # leafs e Mode for work 32 8 0.598 Mode for work 2 6 0.595 Selection 40 35 0.686 Selection 1 1 0.669 With-whom 39 72 0.499 With-whom 4 51 0.467 Duration 41 148 0.431 Duration 4 38 0.368 Start time 63 121 0.408 Start time 8 110 0.382 Trip chain 53 8 0.802 Trip chain 10 13 0.811 Mode other 35 63 0.524 Mode other 11 60 0.508 Location 1 28 30 0.540 Location 1 6 15 0.513 Location 2 28 47 0.372 Location 2 8 14 0.312 Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium Application 1: Empirical results Model performance at activity pattern level Measure Mean Distance (Full approach) Mean Distance (FS approach) SAM (activity type) 2.929 2.862 SAM (with) 3.205 3.112 SAM (location) 3.188 3.034 SAM (mode) 4.706 4.559 UDSAM 16.957 16.43 MDSAM 8.558 8.257 Conclusion: There is no evidence of substantial loss in predictive power when trimmed decision trees are used to predict activitytravel patterns. Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium Application 2: Build complex models by means of BN and try to improve accuracy • General idea: Modelling travelling behaviour is non-trivial as it is multidimensional and complex in nature. Hidden, unknown relationships might have an impact on the final outcome • Need for a technique that is able to deal with this: Bayesian networks – – – – – Able to capture (complex) relationships between variables Able to be learned from data Visualize interdependences between variables Prior and posterior probability distributions per variable Well suited to conduct what-if scenarios and sensitivity analysis – White box Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium • Case study on mode choice facet Steps to follow: (1) Build the network (Structural Learning), (2) Choose a target variable and prune the network, (3) Calculate probability distributions (Parameter Learning), (4) Perform what-if scenarios by entering evidences in the network Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium Example of pruning a network Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium Application 2: Empirical results Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium Application 2: empirical results • Conclusions: – Better predictions Reason: Unlike decision trees (CHAID), variables are selected simultaneously, no hierarchy of importance of the selected variables – Selection of the variables +/- the same in both approaches ( difference in performance more due to different nature than to additional insights) – Much larger number decision rules in Albatross compared with CHAID, however performance is also OK on the test data ( additional research on other datasets is warranted) – Interpretation is an issue, BN link several variables in sometimes complex direct and indirect ways. Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium Application 3: Activity-based versus trip-based • Use (and adapt) Albatross towards the application area of Flanders • Evaluate the performance of activity-based models versus trip-based models • Transportation models: trip based • Mobility Plan Flanders (2003) – Predict in a static way reliable results for distribution, substitution and route effects – They cannot manage generative and temporal shiftings • Need for a more dynamic and more complete model Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium Activity based models Trip based models • Travel demand is derived from activities • 24 hour schedule with activities • Household interaction • Time and space constraints • Just consider oneway trips • Only during peak hour • Individual trips • Calibration is needed to fit the data to the real situation Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium But ... • Trip based models take the outcomes (traffic flows, passengers numbers, ... ) as input in the calibration • As expected, the outcomes are robust and fit the actual situation perfect • The influence of the calibration is much stronger than the influence of the input data Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium • Aim: the application of an activity based model in Flanders • Albatross ► developed for the Dutch situation • First stage: use of the Dutch decision tables • Comparison of the results of the two model types and their performance on the same input data Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium • Data • Travel behaviour study: urban region of Leuven (2001) + trip-based model – Trip schedules (no information on in-home activities) – Locations: zip code ≠ statistical sector – Assumptions: – – – – Overestimation of car and bike availability per household Standard values for work time Transport mode: longest distance in the trip Facility data: not yet available Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium • Assumption: trip based models predict the actual situation almost perfect • ALBATROSS: • Mean length of the schedules is shorter than in the Dutch example (reason: conversion trip schedule to an activity schedule) • SAM values (parameters for Goodness-ofFit) are very high ►predictions are not good Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium • OD – matrices: match reasonably well • Activity type: – Good predictions for work and bring and get – Grocery and non-grocery is a problem • Length of tours – Predict too much short tours (< 2 km) • Transport mode – Too much public transport and car passengers – Too little car drivers Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium • Predictions are not good: fortunately! – Refinement of input data/ facility data – Adaptation to the Flemish situation of the decision tables – Trip based model runs without traffic flows and passenger numbers for a real fair comparison – Run model on other Flemish regions Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium Some words on what else we have been doing… • An alternative approach to model activity-travel decisions is also under development at our research group – This model assumes that each diary consists of correlated successive activities. • For instance during morning: Sleep-Having BreakfastTransportation to work • Markov chains are often used to model this type of dependences: – Transition Matrix: =First-order Markov Chain Transition Matrix: = Second-order Markov Chain Etc. Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium Artificial Example Diary 1: TcFFFFFFFFFFFFFFFFE Diary 2: TcEEFREREERFTcFTcFFTcFETcF Diary 3: RREFEFEETcTcR Diary 4: EEFFTcFTcFRRTcTcRTcRR Diary 5: FFTcFFRE Diary 6: EETcFRRE With Tc= Transportation, with car as transport mode, F=visit Family, E=Eat, R=Read Tc E R F Tc E R F 0.11 0.23 0.10 0.21 0.03 0.40 0.53 0.20 0.16 0.08 0.30 0.28 0.70 0.29 0.07 0.31 These probabilities can be computed by means of Markov Chains Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium Example derived from data • Simulation procedure: Simulate Xt as a function of the values taken by Xt-1 and Xt-2 Repetitive procedure Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium • Some results… Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium Let’s recapture things… •Why transportation modelling? •Which kinds of transportation modelling? •Why activity-based transportation modelling? •Which activity-based transportation model? •Model Selection: Albatross –What is Albatross? –Things what we have been doing and are still going to do wrt Albatross •Introduction of an alternative modelling approach based on sequential dependencies in data (short version) Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium Questions? Limburgs Universitair Centrum, Universitaire Campus, gebouw D, 3590 Diepenbeek, Belgium