Evaluating Forecast Demographic Scenarios Using Population Synthesis and Data Simulation Joshua Auld CTS IGERT Seminar June 25, 2009 Overview Introduction Population Synthesis Forecasting Marginal Variables Travel Data Simulation Model Scenario Analysis ITA Analysis Conclusions Introduction Introduction Travel Demand Forecasting: – Typically done at long time horizons (20, 30 year, etc.) – Need forecast demographics to forecast demand – Many ways to do so (expert opinion, trend lines, land-use models, etc.) Move to activity based models: – Require synthetic populations – Used as agents in the ABM simulation – Travel patterns of all agents summed to give demand Data requirements for population synthesis – Household/Individual sample data – joint distribution – Marginal data – small area distributions of single variables Introduction (continued) For forecast synthetic populations: – Same data requirements as base year – Data often nonexistent, no data 30 years in future Solutions for data problems: – Usually use base year sample directly as seed – Update base year marginals – This gives closest population distribution to base year that matches forecast marginals Forecasting marginals can be done in several ways – Full, integrated land-use model (UrbanSim, PECAS, etc.) – Proportional updating (assume same marginal distributions) Common approach for many agencies Introduction (continued) Our approach: – Combine forecasting models, expert opinion / scenario analysis and proportional updating Forecasting models: – Estimate marginal distributions for household size, number of workers – based on limited information (number of households and employees per zone) Expert opinion/scenario analysis – For marginals of interest that are difficult to predict – Allow marginals to be varied by analyst – Easy-to-use scenario definition tool, direct manipulation of marginal distributions Useful where forecast information is limited Objectives of Current Work To demonstrate: – Use of a flexible population synthesizer/scenario evaluation tool – Combined forecast population with data transferability model – synthesize forecast travel attributes – Demonstrate impact of forecast population changes on several travel demand variables – NOT to make realistic travel demand/demographic predictions (left to planning agency) Using Population Synthesis in ITA Evaluation In addition to use in Travel Demand: – Improve ITA communications simulation – Market Analysis – ITA system performance In conjunction with ITA adoption and usage models – – – – Where are the people who will use the ITA? Where are they coming from/going? How will these patterns impact ITA performance Evaluate estimated individual/system benefits Population Synthesis Program Base Population Synthesis Program Link sample data geography to marginal data Choose up to six control variables Define the categories (link btw. sample data and marginal data Apply weighting Specify test variable – Estimate the fit of various forecast populations Population Synthesis Methodology Foreach Pums in Pums_List Fill Pums.HH_List and Pums.PER_List from sample data Initialize Pums.HH_MWay and Pums.PER_Mway Run IPF to fit Pums.HH_MWay and Pums.PER_MWay to Pums.Marginals Foreach BG in Pums.BG_List Seed BG.HH_MWay and Bg.PER_MWay from Pums Run IPF to fit to BG.Marginals Foreach HH in Pums.HH_List For i=0 to Bg.HH_Mway(cell number of HH in Mway) Get Probability of adding household = f(HHtype, HH_Mway, PER_Mway) if HH added update Bg.HH_Mway, BG.PER_MWay, Nremaining Write HH.Data with BG.ID End N p er ,i Next PER _ MWAY (v1, j , v2. j ,...vn , j ) W Next i N remain j 1 Next PHH ,i N N p er ,k C PER _ MWAY (v1,k , v2.k ,...vn ,k ) Next (W k 1 k l 1 N remain Forecasting Control Variables Input base and forecast year required zonal data Link control variable categories to forecast categories – 4 HHsize, 3 numworkers Generate forecast marginals: – Proportional updating, or – Forecast model Scenario Definition Select sub-regions to apply changes Select control variable to modify Adjust variable marginal distribution Multiple selections, modified variables allowed Performance Comparison Our Synthesizer – Nearly exact matching of HH level marginals – Close matching of PER level marginals Undercount of high hhsizes, missing group quarters – Tested on Chicago Region 2.9 mm HH, 7.8 mm people (within 2%) 3 HH controls, 3 person controls (560, 112 MWay size) Run time of 123 minutes Guo and Bhat 2007 – Test on Dallas/Fort-Worth 5 HH controls, 3 person controls (336, 140 MWay size) – Introduces slack in selection procedure – marginals not matched – No Performance characteristics given Ye et al. 2009 – Test on Maricopa County (Phoenix) – 1.1 mm HH, 3.1 mm people 3 HH controls, 3 person controls (280, 140 MWay Size) – Seems to match distributions well – heuristic weight setting procedure – Run Time of 16 hours Forecasting Control Variable Distributions Forecasting Forecasting often done by proportional updating – Assume same marginal distribution in forecast year However, marginals change over time – i.e. changes in pop, households, housing, etc. lead to changes in household size – Can see in Census data, marginal dist. not constant – Distribution of each marginal should therefore change Need model of marginal changes – Only for certain variables (HH Size and Number of Workers in this study) – Need data that drives marginal changes – Income, race, etc. changes not modeled – done through scenario definition SURE Forecasting Model SURE marginal changes forecasting model: – – – – – System of linear regression equations Related only through correlated error terms Accounts for cross equation correlations d(hh,emp) -› dhhsize=1, dhhsize=2, etc. Estimate change in hhsize and num workers categories Model specification: y1i 1 x1i 1i y 2i 2 x2i 2i y Ni N x Ni Ni E i 0; E i j ij I SURE Forecasting Model: Explanatory Variables Dependant variables are change in HH in each category: – – – – HHsize=1, HHsize=2, HHsize=3-4, HHsize=5+ NumWorkers=0-1, NumWorkers=2+, NumWorkers=NA (non-family) All dependent variables normalized by base year total HH i.e. change in HHsize=i per base year household Independent Variables include: – – – – – – Total households in zone, base and forecast Total employment in zone, base and forecast Household Density, base and forecast Base year demographics Base year land use mix: (% of area devoted to Single Family) Job accessibility (base and forecast – base year LOS/mode split) SURE Forecasting Model: HH Size Results MODEL: Constant D HH / HH (%HHS=i ) x D HH / HH D JOBS/HH) (HH DENSITY ) D HH DENSITY) (%SINGLE ) (%RACE_OTHER ) R D HHS / HH 0.032 0.076 0.604 0.050 -5.71E-07 3.19E-05 --0.130 D HHS / HH 0.017 0.112 0.603 ----0.015 -0.096 D HHS / HH -0.013 0.151 0.604 -0.032 --2.19E-05 -0.015 0.057 D HHS / HH -0.037 0.057 0.603 -0.018 5.71E-07 -1.00E-05 0.030 0.168 0.68 0.80 0.88 0.55 SURE Forecasting Model: Number of Workers Results MODEL Constant D HH /HH (%HH =i ) x D HH /HH D (HH ) / HH D JOBS/HH) % BLACK % OTHER HH DENSITY D HH DENSITY) (JOBS/HH) R D NWORK / HH 0.048 0.270 0.047 -0.415 0.037 0.020 -0.028 -7.33E-06 4.41E-05 -0.020 0.66 D NWORK / HH -0.043 0.656 0.047 -0.632 -0.028 -0.020 0.028 5.54E-06 -5.22E-05 0.011 0.92 D NWORK / HH -0.005 0.026 0.047 1.048 -0.009 0.000 0.000 1.79E-06 8.12E-06 0.008 0.92 SURE Forecasting Model Validation Validation run for HHsize and NWork models – Run using unseen data (1980) – Validation forecast: 1980 to 2000 – Compared against results from proportional updating Shows moderate improvement (~10%) in R2, RMSE HHSize Validation: Model Base Year 1990 1980 Forecast Year 2000 2000 RMSE 79 110 Proportional R2 0.75 0.65 RMSE 89 127 R2 0.68 0.53 % Improvement RMSE 13.3% 15.9% R2 10.4% 23.5% NumWorkers Validation: Base Year 1990 1980 Forecast Year 2000 2000 Model R2 RMSE 107 0.77 138 0.65 Proportional R2 RMSE 119 0.72 150 0.59 % Improvement R2 RMSE 11% 7% 9% 11% Travel Data Simulation Model Data simulation overview Objective – Quick alternative to travel demand model – Generating joint disaggregate travel data at household level – Transfer data from NHTS to synthetic population Travel Attributes – – – – – Auto Trip Household Total Trips per Day Household Mandatory Trips per Day Household Maintenance Trips per Day Household Discretionary Trips per Day Household Auto Trips per Day Total Trip Mandatory Trip Maintenance Trip Discretionar y Trip Data simulation overview Travel attributes generating models – 32 explanatory variables are employed including (NHTS, TIGER files): – Household socio-demographic characteristics. E.g. – Age – Income – Occupation – Education – Ethnicity – …. – Built-environment variables. E.g. – Residential density – Intersection density – Transit Use –… Data simulation model Travel attributes generating models – Models are decision trees with a maximum of three depth levels – Decision trees were tested against the observed travel data for Des Moines add-on data and they provided good fits Simulation Model Validation Travel attributes generating models – Probability density functions for observed, transferred and national household total number of trips per day in 0.08 Des Moines area 0.07 0.06 0.05 Transferred 0.04 Obsereved 0.03 National 0.02 0.01 0 0 10 20 30 40 50 60 Analysis Results Scenarios Analyzed Base year, Forecast year and two scenarios analyzed for six-county Chicago region Four different synthetic populations generated – – – – BY: 2000 (base year) FY: 2030 (forecast year) S1: 2030 High Ageing S2: 2030 High Ageing in Suburbs, Lowered Age in Chicago Travel data indicators simulated for each scenario Scenario Marginal Distributions Scenario 1: High Ageing 30 25 20 15 10 5 0 15 25 35 45 Original 55 65 75 85 Scenario Scenario 2: Increased Youth in Chicago Scenario 2: High Ageing in Suburbs 40 30 35 25 30 20 25 20 15 15 10 10 5 5 0 0 15 25 35 45 Original 55 Scenario 65 75 85 15 25 35 45 Original 55 Scenario 65 75 85 Selected scenario analysis results Change in Total Trips/HH for S1 and S2 compared to FY: Increase No change Decrease Selected scenario analysis results Change in Discretionary Trips / HH for S1 and S2 compared to FY: Increase No change Decrease Selected scenario analysis results Change in Auto Share for S1 and S2 against FY Increase No change Decrease Scenario Analysis Results Aggregate results for whole region, Chicago and suburbs: – Ageing decreases total trips, increases auto share overall – In Chicago, increased aging and decreased aging both increase auto share BY FY S1 S2 Total Trips 11.38 11.32 10.60 10.30 Whole Region - Average Per Household Mandatory Maintenance Discretionary 1.76 3.09 2.52 1.76 3.07 2.50 1.57 2.85 2.40 1.54 2.76 2.34 BY FY S1-high ageing S2-low ageing Daily Trips 11.05 10.79 10.46 10.84 Mandatory 1.69 1.66 1.52 1.63 Chicago Maintenance 2.98 2.90 2.82 2.89 Discretionary 2.47 2.40 2.36 2.44 Auto Share 85.5% 85.6% 86.3% 86.0% Mandatory 1.78 1.78 1.59 1.51 Suburbs Maintenance 3.12 3.11 2.86 2.73 Discretionary 2.54 2.53 2.41 2.31 Auto Share 90.7% 91.0% 91.7% 92.3% BY FY S1-high ageing S2-higher ageing Daily Trips 11.47 11.45 10.64 10.17 Auto Share 89.6% 90.0% 90.7% 91.0% ITA Analysis Demonstration ITA Analysis Demonstration Under an assumed ITA adoption model (binary choice model with made up numbers): 1 1 e Vi Vi 2.0 0.03( Agei ) 0.25( Malei ) 0.015( HHInc i ) 0.02(TTimei ) 0.2( Degreei ) PITA,i – Average of 24% using an ITA – Probability increases with gender, Income, Travel Time to work, having a degree – Decreases with age Plot distribution of ITA users (density by Block group) ITA Usage Results After applying model to synthetic population Shows ITA density per sq. mile for each block group in Chicago Area High Density Areas – North Side – Loop Low Density – SW Suburbs – South Side Conclusions Conclusions and Discussion Flexible, easy to use scenario analysis tool – Few limitations on geography/analysis variables Allows: – – – – Accurate forecast, with minimal info requirements Quick scenario visualization/analysis Apply different scenarios to different sub-regions Multiple levels of control (household and person) Useful for: – 4-step travel demand – reduce agg. bias – ABM – synthesize agents for microsimulation – ITA Analysis Performance: – Compares very favorably to other population synthesizers Thank You! Questions?