Introduction to Large Scale Modeling Systems Cinzia Cirillo, Ph.D., Associate Professor Department of Civil and Environmental Engineering University of Maryland Sept 3rd, 2013 Heart 2013 – Summer School Stockholm About myself • MS Civil Engineering Universita “Federico II” – Naples (ITALY) • PhD Transportation Engineering Politecnico di Torino – Torino (ITALY) • Stagiare and Consultant Hague Consulting Group The Hague (NL) and Cambridge (UK) • Post Doc – Marie Curie fellowship (EU) Applied MATH - University of Namur (BELGIUM) • Assistant and now Associate Professor (with tenure) Department of Civil and Environmental Engineering University of Maryland (USA) • This year on sabbatical leave at TU-Delft (NL) 2 My students at UMD… (Pratt, Michael, JM, Nayel, Renting, Me, Yangwen) 3 Table of Contents • • • • • Terminology Four-step models Tour-based models Activity-based models Integrated land use and transportation models 4 Terminology of Network Representation • Zones (Centroid, Centroid Connectors) • Nodes • Links 5 6 Four-Step Trip-Based Travel Demand Model 7 What are the Four Steps? • Trip Generation (Ti) – Number of trips produced in and attracted to zone “I” [Number of trips that will be generated] • Trip Distribution (Tij) – Number of trips produced in zone “i” and attracted to zone “j” [Where the trips might go] • Mode Split (Tijm) – Number of trips produced in zone “i” and attracted to zone “j” traveling by mode “m” [Which mode of transportation do travelers choose – automobile, rail, bus, bicycle, etc.] • Traffic Assignment (Tijmr) – Number of trips produced in zone “i” and attracted to zone “j” traveling by mode “m” over route “r” [Predicts the path the trips will take] 8 Trip Generation Model: Terminology • • • • • • • • • Trip Trip Ends Tours Home-Based Trip Non-Home-Based Trip Trip Production Trip Attraction Trip Generation Trip Purpose 9 Methods for Trip Generation Modeling • • • • • • Growth Factor Analysis – Often used for external trip generation modeling Cross Classification Methods – Most widely used in current practices Regression Models – Zone-level regression – Household level regression Combining Cross-Classification and Regression Methods Trip Rate Analysis – Often used for special trip generators – ITE Trip Generation Manual Matching Trip Generations and Attractions 10 Cross Classification Models 11 Zone-Level Regression Analysis: Example Pi = 22.4 + 1.87HHi + 0.22Ai Aj = 57.2 + 0.87Ej + 0.15Rj • • • • • • Production Attraction Pi: Total number of HBW trips produced from zone i Aj: Total number of HBS trips attracted to zone j HHi: Total number of households in zone i Ai: Total number of automobiles in zone i Ej: Total employment in zone j Rj: Total retail space in zone j 12 Matching Productions and Attractions • • • • • Pi: number of trips produced from zone i Aj: number of trips attracted to zone j P’i adjusted number of trips produced from zone i A’j: adjusted #trips attracted to zone j i, j: index of zones 13 Methods for Trip Distribution Modeling • Growth Factor Analysis – Use it only if there are no other better feasible method • Synthetic Model (e.g. Gravity Model) – Most widely used in practice • Discrete Choice Model – More flexible model structure and behaviorally rich – Gravity model can be shown as a special case of discrete choice model • Statistical/Optimization Methods for Estimating OD Trip Tables from Traffic Counts • Intervening Opportunity Model • Etc. 14 Basic Gravity Model • Idea comes from Newton’s Law of Gravitation • • • • Where: Tij: Number of trips from to Pi: Productions at i Aj: Attractions at j Fij: A function of travel time, distance, and/or cost – e.g. Fij = 1/(Cij)2 , or Fij = exp(b*Cij) • Kij: Socioeconomic factor (specified by the modeler) 15 Model Choice • • • • Decision maker Alternatives Attributes of alternatives Decision rule – Conjunctive rules, e.g. Satisfaction – Disjunctive rules, e.g. A set of if-then rules – Lexicographical rules, e.g. Dominance – Compensatory rules, e.g. Utility maximization – Combination of rules, e.g. Elimination by aspects – Other Heuristic Decision Rules – Etc 16 Utility Maximization Theory U1n = U(t1n, c1n) = b1t1n + b2c1n U2n = U(t2n, c2n) = b1t2n + b2c2n • Individual n chooses alternative 1 if U1n>U2n • When there are multiple alternatives, individual n chooses alternative 1 if U1n > Uin of all other alternatives. 17 Random Utility Maximization Theory U1n = U(t1n, c1n) = b1t1n + b2c1n + e1n U2n = U(t2n, c2n) = b1t2n + b2c2n + e2n Let V1n = b1t1n + b2c1n V2n = b1t2n + b2c2n U1n = V1n + e1n U2n = V2n + e2n V: Systematic utility e: Random utility 18 Discrete Choice Models • Binary: Prob (1) = P(U1 > U2) = P (V1n + e1n > V2n + e2n) = P (e2n – e1n < V1n – V2n ) • Multinomial: Prob (i) = P(Ui > Max Uj,j≠i) • If e is assumed to be normally distributed, a Probit choice model is obtained. • If e is assumed to be logistically distributed, a Logit choice model is obtained. 19 Traffic Assignment / Equilibrium Supply: Travel cost = f (Travel demand) Demand: Travel demand = f (Travel cost) • An equilibrium is achieved when both supply and demand equations are simultaneously satisfied • Wardrop’s Two Traffic Equilibrium Principles – First Principle: User Equilibrium (UE) Each user acts to minimize his/her own travel cost. At UE, all used routes between each OD pair have equal travel costs, while all unused routes have higher travel costs. – Second Principle: System Optimal (SO) Each user acts to minimize the total travel cost in the system. At SO, the lowest total system travel cost is achieved. 20 Classification of Traffic Assignment 21 Tour-Based and Activity-Based Models 22 Activity-Based Models Recognize… • Travel is a derived demand • Spatial, temporal, transportation and interpersonal interdependencies constrain activity/travel behavior • Household and other social factors/structures influence travel and activity behavior Activity-based approaches aim at predicting which activities are conducted where, when, for how long, with whom, the transport mode involved and ideally also the implied route decisions. 23 Typical Specification of ABM: Type 1 • Population synthesis and updating • Mobility-lifestyle choices (auto, home location etc.) • Day-level activity pattern generation (List of Activities with or without sequencing) • Scheduling of activities • Activity or tour-based mode and destination choices 24 Typical Specification of ABM: Type 2 • Population synthesis and updating • Mobility-lifestyle choices (auto, home location etc.) • Day-level activity pattern generation (Primary and secondary tours and their sequencing) • Tour-level primary activity destination, mode, and scheduling choices • Stop-level secondary activity destination, mode, and scheduling choices 25 ABM History • • • • • • • • • • • • • • • • • HATS (Jones 1979) CARLA (Jones et al. 1983) STARCHILD (Recker et al. 1986a, 1986b) SCHEDULER (Garling et al. 1989) SMASH (Ettema et al. 1993) SAMS and AMOS (Kitamura et al. 1993, RDC Inc. 1995, Kitamura et al. 1996) MIDAS (Kitamura and Goulias 1989, Goulias and Kitamura 1996) SMART (Stopher et al. 1996) GISICAS (Kwan 1997) PCATS (Kitamura and Fujii 1998) ALBATROSS (Arentze and Timmermans 2000) PETRA (Fosgerau 2001) SIMAP (Kulkarni and McNally 2001) TASHA (Miller and Roorda 2003) CEMDAP ( Bhat et al. 2004) FAMOS (Pendyala et al. 2004) TRANSIMS (Los Alamos National Laboratory 2005) 26 ABM in Practice • • U.S. – Atlanta, GA – Boston, MA – Columbus, OH – Dallas, TX – Denver, CO – New York, NY – Portland, OR – Sacramento, CA – San Francisco, CA – Southeast Florida – Statewide in Oregon International – Netherlands – Swiss – Germany – Chile – Etc 27 ABM Benefits • Predicts travel behavior along a continuous time axis and scheduling adjustments; • Assesses the impact of sophisticated travel demand management measures; • Can be easily modified to evaluate policy scenarios with or without new SP surveys (e.g. extended transit service, dynamic pricing, daycare facilities at work, flexible work hours); • Produces results with desired level of spatial and temporal accuracy using synthetic population sample; • More comprehensively evaluates the impact of transportation projects and policies on the entire activity-travel pattern not trip travel pattern, just on a trip. 28 ABM Data Needs • Demand Side – Longitudinal and geographic information on household or individual time use (e.g. type of activities, travel, activity locations, activity duration, scheduling); – Socio-demographic information (e.g. household composition, age, gender, job, income, housing); – Auto-ownership and other household mobility and lifestyle choices; – Activity-travel pattern changes/shifts over time and in response to transportation system changes; – Household characteristics with regard to telecommunication. 29 ABM Data Needs • Supply Side – Transportation networks coded to the activity-stop level; • Level of service of the transportation network by time of day (this could be endogenous with DTA); – Daily, day-of-the-week, and seasonal activity time windows (e.g. store open hours, periods during which specific activities can be pursued); – Spatial and non-spatial inventory of activity locations, land use, and economic data. 30 ALBATROSS (Arentze and Timmermans 2000, 2004) • Albatross: A learning based transportation oriented simulation system • The model predicts which activities are conducted when, where, for how long, with whom and also transport mode • Decision tree is proposed as a formalism to model the heuristic choice • Considers various constraints on behavior: – Situational constraints: can’t be in two places at the same time – Institutional constraints: such as opening hours – Household constraints: such as bringing children to school – Spatial constraints: e.g. particular activities cannot be performed at particular locations – Time constraints: activities require some minimum duration – Spatial temporal: constraints an Spatial-individual cannot be at a particular location at the right time to conduct a particular activity 31 ALBATROSS (Arentze and Timmermans 2000, 2004) • Albatross assumes that choice behavior is based on rules that are formed and continuously adapted through learning while the individual is interacting with the environment (reinforcement learning) or communicating with others (social learning). • Options for rule-based behavior representation: – Decision trees (used in Albatross) – Classification rules – Bayesian network – Etc. 32 Albatross Model Flowchart Each oval represents a decision tree 33 34 CEMDAP (Bhat et al. 2003) • CEMDAP: Comprehensive Econometric Micro-simulator for Daily Activity-travel Patterns” • A system of econometric models that represent the activity-travel decisionmaking behavior of individuals. • Input: Various land-use, socio-demographic, activity system, and transportation level-of-service attributes • Output: Complete daily activity-travel patterns for each individual in the household. 35 Daily Activity-Travel Pattern: Worker 36 Daily Activity-Travel Pattern: Non-Worker 37 CEMDAP Modeling Framework 38 Activity Generation-Allocation Module 39 Activity Generation-Allocation Models 40 Pattern/Tour/Stop-Level Scheduling Modules 41 Pattern/Tour/Stop-Level Scheduling Models 42 Activity-Based Model Applications • • • • • • • • • • Effects of development patterns on travel behavior Sensitivity to price and behavioral changes Effects of transportation system and system condition Need for improved validity and reliability Ability to evaluate policy initiatives Better analysis of freight movement Ability to show environmental effects Modeling low-share alternatives Better ability to evaluate effects on specific subgroups Reflect non-system policy changes (TDM, ITS) 43 Transportation Eras and Urban Growth Patterns 44 Integrated land use and transportation models 45 Population Density vs. Distance to City Center 46 Population Density vs. %Transit Mode Share 47 Transportation and Land Use 48 A More Detailed Theoretical Framework 49 Land Use Model Components • • • • Input: Total population and total employment by type in the study area Output: Population and employment by type in each spatial analysis unit Typical Spatial Analysis Unit: TAZ, Census tract, Parcel, Block, Grid cell Demand Modules: Household location choice, Employment location choice, and/or Household/employment relocation choice • Supply Modules: Housing development, business real estate development • Balancing Supply and Demand: No balancing, Price and equilibrium, Disequilibrium 50 Land Use-Transportation Microsimulation 51 UrbanSim 52 The Travel/Activity Scheduler for Household Agents (TASHA) model 53 Thank you! Q&A 54 Integrated Discrete Continuous Choice Models Theory and Applications Household Vehicle Ownership, Type and Usage 55 Table of Contents 1. Introduction 2. Methodology 3. Case Study 4. Conclusions 5. Future Work 57/32 Introduction Methodology Case Study Conclusions Future Work Motivation • • • • • • In the U.S., transportation contributes approximately 27 percent of total greenhouse gas emissions. 71 percent of the oil consumption directs to fuels used in transportation, in which 40 percent is used to fill up gasoline tanks in our personal vehicles. The American households are highly dependent on private vehicles – in 2009, the average vehicle ownership per household is 2.05, and there are only about 5% of the households who do not have a car. The use of private vehicles has strong relationship with traffic congestion, energy consumption and our environment. Therefore, it is very crucial to understand the people’s behavior on the wheels, particularly, how many vehicles they own, the types of the vehicles and how many miles they travel. In fact, households make those decisions simultaneously. As transportation modelers, we’d better to estimate the decisions in one system, in stead of separately, in order to best understand their travel behavior hence provide better reference for the policy makers. However, in the literature there are only a few studies that investigated the three choices jointly. 58/32 Introduction Methodology Case Study Conclusions Future Work Literature Review • Discrete-continuous models derived from conditional indirect utility function (i.e., Train, 1986) – In the discrete part, the utilities of the alternatives are represented by conditional indirect utility functions, and the person will choose the alternative with the highest utility. – In the continuous part, the demand functions are derived from the conditional indirect utility functions by using Roy’s identity property. • Limitations: – The models estimate the choice probabilities and the demand equations sequentially, not simultaneously . – The estimates are consistent but not as efficient as full information maximum likelihood, because the unobserved component of utility and the error in the demand equation generally contain some common unobserved factors. 59/32 Introduction Methodology Case Study Conclusions Future Work Literature Review (Con’t) • Multiple Discrete Continuous Extreme Value (MDCEV) model • Limitations: – Does not include vehicle holding decision. – Requires fine classification of vehicles as one type of vehicle cannot be chosen twice by the household. – The assumption of fixed total mileage budget for every household implies that it is not possible to predict changes in the total number of miles in response to policy changes. – There is only a single error term underlying both discrete and continuous choices. 60/32 Introduction Methodology Case Study Conclusions Future Work Literature Review (Con’t) • Bayesian Multiple Ordered Probit and Tobit (BMOPT) Model • Limitations: – The computation becomes intensive for a large number of vehicle categories, as the number of equations to be estimated increases proportionally with the number of vehicle types. – Ordered mechanism may not perform as well as unordered mechanism in modeling car ownership decisions (Bhat and Pulugurta, 1998; Potoglou and Kanaroglou, 2008) 61/32 Introduction Methodology Case Study Conclusions Future Work Research Objectives • Develop a mathematical framework to model the household choices on vehicle ownership, the types and annual mileage traveled; in particular, the model should be able to – simultaneously estimate discrete (vehicle holding and types) and continuous (vehicle usage) decision variables; – take into account a large number of alternatives in both the vehicle holding and the vehicle type choices; – have no budget on the mileage traveled; – capture the correlations of the unobserved factors between the discrete and continuous parts; – have flexible specifications; and – be sensitive to policy analysis. • In addition, investigate the performance of ordered and unordered structures in discrete-continuous models. 62/32 Introduction Methodology Case Study Conclusions Future Work Unordered Discrete-Continuous Model Number of vehicles & the type of each Household 0 1 - Type1 2 - Type1 & Type2 3 - Type1 & Type2 & Type3 4 - Type1 & Type2 & Type3 & Type4 Annual miles traveled 63/32 Introduction Methodology Case Study Conclusions Future Work Unordered Discrete-Continuous Model (Con’t) • In the unordered structure, the household is assumed to be rational and to choose the alternative of vehicle ownership level that maximizes its utility. 64/32 Introduction Methodology Case Study Conclusions Future Work Unordered Discrete-Continuous Model (Con’t) • The discrete choices Y – Multinomial Probit Where, • The continuous choice Yreg– Regression 65/32 Introduction Methodology Case Study Conclusions Future Work Unordered Discrete-Continuous Model (Con’t) • The integrated discrete-continuous model: 66/32 Introduction Methodology Case Study Conclusions Future Work Unordered Discrete-Continuous Model (Con’t) • Estimation with Monte Carlo Simulation: • Where is a draw from a multivariate normal with mean and variance • Then, the final Simulated Log Likelihood of the model is: 67/32 Introduction Methodology Case Study Conclusions Future Work Unordered Discrete-Continuous Model (Con’t) • Estimation with Numerical Computation (Genz ,1992): 68/32 Introduction Methodology Case Study Conclusions Future Work Ordered Discrete-Continuous Model • The ordered response structure uses latent variables to represent the vehicle ownership propensity of the household. • Suppose two latent variables yd and yr represent the preference levels for vehicle holding and vehicle usage: • The number of vehicles holding by the household (Y ) is determined by the value of latent variable yd, specifically: 69/32 Introduction Methodology Case Study Conclusions Future Work Ordered Discrete-Continuous Model (Con’t) • Similarly, in order to jointly to capture the correlation between the discrete and continuous parts, we allow the error terms to be correlated. Thus, the error terms follow a bivariate normal distribution: • The likelihood of one observation is • Where 70/32 Introduction Methodology Case Study Conclusions Future Work Ordered Discrete-Continuous Model (Con’t) • Tthe conditional mean and variance of ordered probit are: • The final likelihood of one observation can be written as: • where, 71/32 Introduction Methodology Case Study Conclusions Future Work Case Study • Data sources: – 2009 National Household Travel Survey (NHTS) data – 1420 observations in the Washington D.C. Metropolitan area – Vehicle characteristics • Choice set: – Vehicle holding: 0, 1, 2, 3 and 4 car(s) – Vehicle type: 120 alternatives for the type choice of each vehicle (12 classes x 10 vintages) – Vehicle usage: annual miles traveled 72/32 • • • 12 classes of vehicle for each 10 vintages—Total of 120 alternatives The classes of vehicles are – small domestic car; Subsample of chosen alternative plus 20 – compact domestic car; randomly selected ones – mid-size domestic car; – large domestic car; – luxury domestic car; – small import car; – mid-size import car; – large import car; – sporty car; – minivan/van; – pickup trucks; – SUVs. The 10 vintages are pre-1999 and the years 2000 through 2008. Introduction Methodology Case Study Conclusions Future Work Case Study (Con’t) • Data Statistics 74/32 Introduction Methodology Case Study Conclusions Future Work 75/32 Introduction Methodology Case Study Conclusions Future Work Estimations of Vehicle Type Sub-models 76/32 Introduction Methodology Case Study Conclusions Future Work Estimations of Vehicle Type Sub-models (Con’t) 77/32 Introduction Methodology Case Study Conclusions Future Work Model Estimations unordered discrete-continuous unordered discrete-continuous model with simulation model without simulation Same as Model 2 except Ordered discreteno logsum (utility from the continuous model type choices) 78/32 Introduction Methodology Case Study Conclusions Future Work Model Estimations (Con’t) *Note: Model 1 is the unordered discrete-continuous model with simulation; Model 2 is the unordered discretecontinuous model with numerical computation; Model 3 is the ordered discrete-continuous model; Model 4 is the same as Model 2 except excluding the "logsum" variable, which make it comparable to Model 3. 79/32 Introduction Methodology Case Study Conclusions Future Work Model Estimations (Con’t) 1 car 2 cars 3 cars 4 cars Mileage 1 car 2 cars 3 cars 4 cars Mileage 1 car 2 cars 3 cars 4 cars Mileage 1 car 2 cars 3 cars 4 cars Mileage #cars mileage 1 car #cars mileage 2 cars 3 cars 4 cars Mileage 1 car 2 cars 3 cars 4 cars Mileage 80/32 Introduction Methodology Case Study Conclusions Future Work Model Applications 81/32 Introduction Methodology Case Study Conclusions Future Work Model Applications (Con’t) 82/32 Introduction Methodology Case Study Conclusions Future Work Conclusions • Developed an integrated discrete continuous choice model to simultaneously estimate the household choices on vehicle ownership (discrete), the types (discrete) and annual mileage traveled (continuous). • The model is able to include a large number of alternatives in both the vehicle holding and the vehicle type choices. • The model allows unrestricted correlations of the unobserved factors between the discrete and continuous parts. • The model accommodates flexible specifications. • There is no budget constraint in the mileage traveled. • The model can be applied for policy analysis. 83/32 Introduction Methodology Case Study Conclusions Future Work Conclusion (Con’t) • The case study for the Washington D.C. Metropolitan area is based on the latest national dataset – 2009 NHTS • The preliminary results show that – the model gives reasonable estimates of the coefficients; – the covariance matrix well explains the correlations between the unobserved factors from the utilities of the discrete choices and the demand function of the continuous choice; – the non-simulation approach provides better model fit; – the performance of the model is improved if the information about vehicle type choice is included; – the unordered discrete continuous model is more appropriate in estimating household vehicle ownership and usage decisions, than the ordered discrete continuous model . 84/32