Evaluating Forecast Demographic Scenarios Using Population Synthesis and Data Simulation Joshua Auld

advertisement
Evaluating Forecast Demographic Scenarios Using
Population Synthesis and Data Simulation
Joshua Auld
CTS IGERT Seminar
June 25, 2009
Overview
 Introduction
 Population Synthesis
 Forecasting Marginal Variables
 Travel Data Simulation Model
 Scenario Analysis
 ITA Analysis
 Conclusions
Introduction
Introduction
 Travel Demand Forecasting:
– Typically done at long time horizons (20, 30 year, etc.)
– Need forecast demographics to forecast demand
– Many ways to do so (expert opinion, trend lines, land-use
models, etc.)
 Move to activity based models:
– Require synthetic populations
– Used as agents in the ABM simulation
– Travel patterns of all agents summed to give demand
 Data requirements for population synthesis
– Household/Individual sample data – joint distribution
– Marginal data – small area distributions of single variables
Introduction (continued)
 For forecast synthetic populations:
– Same data requirements as base year
– Data often nonexistent, no data 30 years in future
 Solutions for data problems:
– Usually use base year sample directly as seed
– Update base year marginals
– This gives closest population distribution to base year that
matches forecast marginals
 Forecasting marginals can be done in several ways
– Full, integrated land-use model (UrbanSim, PECAS, etc.)
– Proportional updating (assume same marginal distributions)
 Common approach for many agencies
Introduction (continued)
 Our approach:
– Combine forecasting models, expert opinion / scenario
analysis and proportional updating
 Forecasting models:
– Estimate marginal distributions for household size, number
of workers
– based on limited information (number of households and
employees per zone)
 Expert opinion/scenario analysis
– For marginals of interest that are difficult to predict
– Allow marginals to be varied by analyst
– Easy-to-use scenario definition tool, direct manipulation of
marginal distributions
 Useful where forecast information is limited
Objectives of Current Work
 To demonstrate:
– Use of a flexible population
synthesizer/scenario evaluation tool
– Combined forecast population with data
transferability model – synthesize forecast
travel attributes
– Demonstrate impact of forecast population
changes on several travel demand variables
– NOT to make realistic travel
demand/demographic predictions (left to
planning agency)
Using Population Synthesis
in ITA Evaluation
 In addition to use in Travel Demand:
– Improve ITA communications simulation
– Market Analysis
– ITA system performance
 In conjunction with ITA adoption and usage
models
–
–
–
–
Where are the people who will use the ITA?
Where are they coming from/going?
How will these patterns impact ITA performance
Evaluate estimated individual/system benefits
Population Synthesis Program
Base Population Synthesis Program
 Link sample data geography
to marginal data
 Choose up to six control
variables
 Define the categories (link
btw. sample data and
marginal data
 Apply weighting
 Specify test variable
– Estimate the fit of various
forecast populations
Population Synthesis Methodology
Foreach Pums in Pums_List
Fill Pums.HH_List and Pums.PER_List from sample data
Initialize Pums.HH_MWay and Pums.PER_Mway
Run IPF to fit Pums.HH_MWay and Pums.PER_MWay to Pums.Marginals
Foreach BG in Pums.BG_List
Seed BG.HH_MWay and Bg.PER_MWay from Pums
Run IPF to fit to BG.Marginals
Foreach HH in Pums.HH_List
For i=0 to Bg.HH_Mway(cell number of HH in Mway)
Get Probability of adding household = f(HHtype, HH_Mway, PER_Mway)
if HH added
update Bg.HH_Mway, BG.PER_MWay, Nremaining
Write HH.Data with BG.ID
End
N p er ,i
Next
PER _ MWAY (v1, j , v2. j ,...vn , j )
W
Next
i
N remain
j 1
Next
PHH ,i  N
N p er ,k
C
PER _ MWAY (v1,k , v2.k ,...vn ,k )
Next

 (W 
k 1
k
l 1
N remain
Forecasting Control Variables
 Input base and forecast
year required zonal data
 Link control variable
categories to forecast
categories
– 4 HHsize, 3 numworkers
 Generate forecast
marginals:
– Proportional updating, or
– Forecast model
Scenario Definition




Select sub-regions to apply
changes
Select control variable to
modify
Adjust variable marginal
distribution
Multiple selections, modified
variables allowed
Performance Comparison

Our Synthesizer
– Nearly exact matching of HH level marginals
– Close matching of PER level marginals
 Undercount of high hhsizes, missing group quarters
– Tested on Chicago Region 2.9 mm HH, 7.8 mm people (within 2%)
 3 HH controls, 3 person controls (560, 112 MWay size)
 Run time of 123 minutes

Guo and Bhat 2007
– Test on Dallas/Fort-Worth
 5 HH controls, 3 person controls (336, 140 MWay size)
– Introduces slack in selection procedure – marginals not matched
– No Performance characteristics given

Ye et al. 2009
– Test on Maricopa County (Phoenix) – 1.1 mm HH, 3.1 mm people
 3 HH controls, 3 person controls (280, 140 MWay Size)
– Seems to match distributions well – heuristic weight setting procedure
– Run Time of 16 hours
Forecasting Control Variable Distributions
Forecasting
 Forecasting often done by proportional updating
– Assume same marginal distribution in forecast year
 However, marginals change over time
– i.e. changes in pop, households, housing, etc. lead to
changes in household size
– Can see in Census data, marginal dist. not constant
– Distribution of each marginal should therefore change
 Need model of marginal changes
– Only for certain variables (HH Size and Number of Workers
in this study)
– Need data that drives marginal changes
– Income, race, etc. changes not modeled – done through
scenario definition
SURE Forecasting Model
 SURE marginal changes forecasting model:
–
–
–
–
–
System of linear regression equations
Related only through correlated error terms
Accounts for cross equation correlations
d(hh,emp) -› dhhsize=1, dhhsize=2, etc.
Estimate change in hhsize and num workers categories
 Model specification:
y1i  1 x1i   1i
y 2i   2 x2i   2i

y Ni   N x Ni   Ni
E  i   0; E  i  j    ij I
SURE Forecasting Model:
Explanatory Variables
 Dependant variables are change in HH in each category:
–
–
–
–
HHsize=1, HHsize=2, HHsize=3-4, HHsize=5+
NumWorkers=0-1, NumWorkers=2+, NumWorkers=NA (non-family)
All dependent variables normalized by base year total HH
i.e. change in HHsize=i per base year household
 Independent Variables include:
–
–
–
–
–
–
Total households in zone, base and forecast
Total employment in zone, base and forecast
Household Density, base and forecast
Base year demographics
Base year land use mix: (% of area devoted to Single Family)
Job accessibility (base and forecast – base year LOS/mode split)
SURE Forecasting Model:
HH Size Results
MODEL:
Constant
D HH / HH
(%HHS=i ) x D HH / HH
D JOBS/HH)
(HH DENSITY
)
D HH DENSITY)
(%SINGLE
)
(%RACE_OTHER
)
R
D HHS /
HH
0.032
0.076
0.604
0.050
-5.71E-07
3.19E-05
--0.130
D HHS /
HH
0.017
0.112
0.603
----0.015
-0.096
D HHS /
HH
-0.013
0.151
0.604
-0.032
--2.19E-05
-0.015
0.057
D HHS /
HH
-0.037
0.057
0.603
-0.018
5.71E-07
-1.00E-05
0.030
0.168
0.68
0.80
0.88
0.55
SURE Forecasting Model:
Number of Workers Results
MODEL
Constant
D HH /HH
(%HH
=i ) x D HH /HH
D (HH
) / HH
D JOBS/HH)
% BLACK
% OTHER
HH DENSITY
D HH DENSITY)
(JOBS/HH)
R
D NWORK /
HH
0.048
0.270
0.047
-0.415
0.037
0.020
-0.028
-7.33E-06
4.41E-05
-0.020
0.66
D NWORK /
HH
-0.043
0.656
0.047
-0.632
-0.028
-0.020
0.028
5.54E-06
-5.22E-05
0.011
0.92
D NWORK /
HH
-0.005
0.026
0.047
1.048
-0.009
0.000
0.000
1.79E-06
8.12E-06
0.008
0.92
SURE Forecasting Model
Validation
 Validation run for HHsize and NWork models
– Run using unseen data (1980)
– Validation forecast: 1980 to 2000
– Compared against results from proportional updating
 Shows moderate improvement (~10%) in R2, RMSE
HHSize Validation:
Model
Base Year
1990
1980
Forecast Year
2000
2000
RMSE
79
110
Proportional
R2
0.75
0.65
RMSE
89
127
R2
0.68
0.53
% Improvement
RMSE
13.3%
15.9%
R2
10.4%
23.5%
NumWorkers Validation:
Base Year
1990
1980
Forecast Year
2000
2000
Model
R2
RMSE
107
0.77
138
0.65
Proportional
R2
RMSE
119
0.72
150
0.59
% Improvement
R2
RMSE
11%
7%
9%
11%
Travel Data Simulation Model
Data simulation overview
 Objective
– Quick alternative to travel demand model
– Generating joint disaggregate travel data
at household level
– Transfer data from NHTS to synthetic
population
 Travel Attributes
–
–
–
–
–
Auto Trip
Household Total Trips per Day
Household Mandatory Trips per Day
Household Maintenance Trips per Day
Household Discretionary Trips per Day
Household Auto Trips per Day
Total Trip
Mandatory
Trip
Maintenance
Trip
Discretionar
y Trip
Data simulation overview
 Travel attributes generating models
– 32 explanatory variables are employed including (NHTS,
TIGER files):
– Household socio-demographic characteristics. E.g.
– Age
– Income
– Occupation
– Education
– Ethnicity
– ….
– Built-environment variables. E.g.
– Residential density
– Intersection density
– Transit Use
–…
Data simulation model
 Travel attributes generating models
– Models are decision trees with a maximum of three depth
levels
– Decision trees were tested against the observed travel
data for Des Moines add-on data and they provided good
fits
Simulation Model Validation
 Travel attributes generating models
– Probability density functions for observed, transferred
and national household total number of trips per day in
0.08 Des Moines area
0.07
0.06
0.05
Transferred
0.04
Obsereved
0.03
National
0.02
0.01
0
0
10
20
30
40
50
60
Analysis Results
Scenarios Analyzed
 Base year, Forecast year and two scenarios
analyzed for six-county Chicago region
 Four different synthetic populations generated
–
–
–
–
BY: 2000 (base year)
FY: 2030 (forecast year)
S1: 2030 High Ageing
S2: 2030 High Ageing in Suburbs, Lowered Age in Chicago
 Travel data indicators simulated for each scenario
Scenario Marginal Distributions
Scenario 1: High Ageing
30
25
20
15
10
5
0
15
25
35
45
Original
55
65
75
85
Scenario
Scenario 2: Increased Youth in Chicago
Scenario 2: High Ageing in Suburbs
40
30
35
25
30
20
25
20
15
15
10
10
5
5
0
0
15
25
35
45
Original
55
Scenario
65
75
85
15
25
35
45
Original
55
Scenario
65
75
85
Selected scenario analysis results
 Change in Total Trips/HH for S1 and S2 compared to FY:
Increase
No change
Decrease
Selected scenario analysis
results

Change in Discretionary Trips / HH for S1 and S2 compared to FY:
Increase
No change
Decrease
Selected scenario analysis
results
 Change in Auto Share for S1 and S2 against FY
Increase
No change
Decrease
Scenario Analysis Results

Aggregate results for whole region, Chicago and suburbs:
– Ageing decreases total trips, increases auto share overall
– In Chicago, increased aging and decreased aging both increase auto share
BY
FY
S1
S2
Total Trips
11.38
11.32
10.60
10.30
Whole Region - Average Per Household
Mandatory
Maintenance Discretionary
1.76
3.09
2.52
1.76
3.07
2.50
1.57
2.85
2.40
1.54
2.76
2.34
BY
FY
S1-high ageing
S2-low ageing
Daily Trips
11.05
10.79
10.46
10.84
Mandatory
1.69
1.66
1.52
1.63
Chicago
Maintenance
2.98
2.90
2.82
2.89
Discretionary
2.47
2.40
2.36
2.44
Auto Share
85.5%
85.6%
86.3%
86.0%
Mandatory
1.78
1.78
1.59
1.51
Suburbs
Maintenance
3.12
3.11
2.86
2.73
Discretionary
2.54
2.53
2.41
2.31
Auto Share
90.7%
91.0%
91.7%
92.3%
BY
FY
S1-high ageing
S2-higher ageing
Daily Trips
11.47
11.45
10.64
10.17
Auto Share
89.6%
90.0%
90.7%
91.0%
ITA Analysis Demonstration
ITA Analysis Demonstration
 Under an assumed ITA adoption model
(binary choice model with made up numbers):
1
1  e Vi
Vi  2.0  0.03( Agei )  0.25( Malei )  0.015( HHInc i )  0.02(TTimei )  0.2( Degreei )
PITA,i 
– Average of 24% using an ITA
– Probability increases with gender, Income, Travel Time to
work, having a degree
– Decreases with age
 Plot distribution of ITA users (density by Block
group)
ITA Usage Results



After applying model
to synthetic
population
Shows ITA density
per sq. mile for each
block group in
Chicago Area
High Density Areas
– North Side
– Loop

Low Density
– SW Suburbs
– South Side
Conclusions
Conclusions and Discussion
 Flexible, easy to use scenario analysis tool
– Few limitations on geography/analysis variables
 Allows:
–
–
–
–
Accurate forecast, with minimal info requirements
Quick scenario visualization/analysis
Apply different scenarios to different sub-regions
Multiple levels of control (household and person)
 Useful for:
– 4-step travel demand – reduce agg. bias
– ABM – synthesize agents for microsimulation
– ITA Analysis
 Performance:
– Compares very favorably to other population synthesizers
Thank You!
Questions?
Download