Toronto CO2 Project – Population Synthesis Documentation – Draft

advertisement
Toronto CO2 Emissions Project
Department of Civil Engineering
Population Synthesis
Documentation - Draft
City of Toronto CO2 Project
Population Synthesis Documentation – Draft Nov 11, 01
Matthew Roorda
1. Introduction
The population synthesis procedures presented in this document are developed as part of
the Toronto CO2 Emissions Project. It is the first stage of the activity/travel model
analysis, which provides the necessary inputs into the household activity scheduler
model. The procedures documented in this report perform the following functions:

Raw 1996 Transportation Tomorrow Survey (TTS) data are “cleaned” such that
missing information for each person/household surveyed is generated based on
some simple rules and information from other people in a similar geographical
area.

New weights (expansion factors) are applied to each household and person record
in the TTS database to reflect population growth totals provided by the City of
Toronto.

Future employment, school and daycare locations are generated for each person
record. Future employment locations are based on the 1996 distribution of placeof-residence place-of-work linkages and the shifts and overall growth in
population and employment that are expected in the GTA, as provided by the City
of Toronto. Future school locations are based on the 1996 distribution of placeof-residence place-of-school linkages and growth in population.
September 27, 2001
Page 1
Toronto CO2 Emissions Project
Department of Civil Engineering
Population Synthesis
Documentation - Draft
2. Data Cleaning
The TTS data are not complete because not all survey respondents were able to or willing
to give complete and sufficient answers to all survey questions. Therefore, for each data
variable many household or person records are coded with an “unknown” data entry. For
these records a preprocessing routine is developed to impute the unknown variable based
on the distributions of the variable for people in the same GTA traffic zone. In addition,
some simple rules were used. The procedure for imputing values for each unknown
variable is shown as follows:
Location Attributes
1996 employment and school zones in the original TTS database have two problems.
First, the external zone numbering system does not match that of the GTA EMME/2
model, on which the final model assignments are to be run. To resolve this problem, an
equivalency, developed as part of an earlier project, was applied to the TTS zone system.
Second, there are a number of employment and school zones coded as 9999 (unknown)
or coded as 4000 (unknown external).
For unknown employment zones the following steps are taken:
 The “most popular” GTA employment zone for each residence zone is determined
(i.e. the place of work zone in which the greatest number of people from the
residence zone work). In many cases this is the same as the residence zone.
 The “most popular” employment zone is entered for each unknown employment
zone.
 For unknown external codes, the closest external zone with a reasonably large
population base is applied
 Employment zones coded 8888 (no usual place of work) are not modified, as this
code is considered to be an important insight into the traveling behaviour of that
person that should be considered by the scheduler model.
For unknown school zones the following steps are taken:
 The “most popular” GTA school zone for each residence zone is determined (i.e.
the place of school zone in which the greatest number of people from the
residence zone attend). In many cases this is the same as the residence zone.
 The “most popular” school zone is entered for each unknown school zone.
 For unknown external codes, the closest external zone with a reasonably large
university/college is applied
 School zones coded 8888 (no usual place of school) are treated in the same way
as those coded as unknown.
Both the revisions to the zone numbering system and generation of missing location
attributes were done using a Microsoft ACCESS database.
September 27, 2001
Page 2
Toronto CO2 Emissions Project
Department of Civil Engineering
Population Synthesis
Documentation - Draft
Household Attributes
The only household attribute that has an unknown value is dwelling type. No
preprocessing is required for this attribute since it is not used in the scheduling program
Person Attributes
Person attributes that require cleaning include age, sex, drivers license, employment
status, occupation, and student status. For each of these attributes, the number of records
and expanded trips coded as unknown is shown in Table 1.
Table 1: Number of missing fields for person attributes
Insert table 1 here
For each attribute, an algorithm is developed to impute a value based on simple rules and
observed distributions. A C++ program, entitled data_clean.cpp was developed in order
to impute the unknown attributes. Program inputs are almost the same as those of the
population synthesis program, as described in Section 3. Two additional fields are
required for each record in the TTS Person data input file (fin_pers.txt), namely,
“made_work_trip” and “made_auto_trip”. The “cleaned” person output file from the data
cleaning program is in the correct format for input into the population synthesis routine.
The output filename is cln_pers.txt.
The algorithm is outlined for each missing person attribute as follows.
Age
Step 1: Assess whether the person is an adult
 If person has drivers license, then age >= 16
 If person is a worker then age >=16
 If only person in household then >=16
Step 2: Randomly choose age based on household zone age distribution
 If person is an adult, then select from the zonal age distribution of people >= 16
 Otherwise select age from the zonal age distribution of all people in the zone
Sex

Choose sex randomly (50% male, 50% female)
Drivers License
 If person has age < 16 or >=80 then assume the person has no drivers license
 If person has age >=16 and <80:
o If person made an auto trip, then the person has a drivers license
o If no auto trip, then determine driver license attribute randomly based on
overall GTA distribution conditional upon broad age category and sex.
September 27, 2001
Page 3
Toronto CO2 Emissions Project
Department of Civil Engineering
Population Synthesis
Documentation - Draft
Employment Status
Step 1: Decide whether employed or not
 If occupation = O, then not employed
 If occupation = 9 (unknown), then
o If made a work trip then employed
o Else If age <16 or >=70, or school_stat = full time, then not employed
o Else choose employment status randomly from zonal distribution
 If person has an occupation type, then employed
Step 2: Choose employment status for employed people
 Choose employment status randomly based on place of residence zonal
distribution (full-time, part-time, work-at-home full time, work-at-home parttime)
Occupation
 Choose occupation randomly from usual place of work zonal occupation
distribution
Student Status
 If age < 5, then not a student
 If age between 5 and 18, then full time student
 If age > 18 or age unknown then not a student
September 27, 2001
Page 4
Toronto CO2 Emissions Project
Department of Civil Engineering
Population Synthesis
Documentation - Draft
3. Population Synthesis Program Input File Formats
The population synthesis program requires the following types of input data:
1) TTS Person Data,
2) TTS Household Data,
3) Future year total population and employment by 1996 TTS traffic zone,
The required input data and file formats are as follows:
1) TTS Person Data (after undergoing the data cleaning process described in Section 2)
Filename: cln_pers.txt
Format: comma delimited
Field - data type:
- Household number – 6 digit integer
- Person number – single digit integer
- Age – integer 0-99
- Sex – single digit character
- Drivers License – single digit character
- Transit Pass – single digit character
- Employment Status - single digit character
- Occupation - single digit character
- No work - single digit character
- Student Status - single digit character
- Planning District of Employment – integer 1-113
- Employment Zone – integer 0-8888
- Free Park - single digit character
- Planning District of School – integer 1-113
- School Zone – integer 0-8888
- Number of Person Trips – integer, 0-99
- Number of Transit Trips – integer, 0-99
Exact data definitions can be found in the Transportation Tomorrow Survey 1996
Version 2.1 Data Guide (see Appendix A)
2) TTS Household Data
Filename: fin_hhld.txt
Format: comma delimited
Field – data type:
- Household number – 6 digit integer
- Household Planning District – integer 1-113
- Census Tract Name – floating point 7.2
- Household zone – integer 1-2670
- 1996 expansion factor – floating point 2.2
- Dwelling Type – single digit character
- Number of persons – integer 1-9
- Number of vehicles – integer 0-99
September 27, 2001
Page 5
Toronto CO2 Emissions Project
Department of Civil Engineering
Population Synthesis
Documentation - Draft
3) Future year total population and employment by 1996 TTS traffic zone
Filename: fin_zone.txt
Format: comma delimited
Field – data type:
- 1996 TTS Zone number – integer 1-8888
i. 1-2670: Internal GTA zones
ii. 4000 – 4409: External zones
iii. 8888: No usual place of work
- Planning District – integer 1-48
i. 1-46: TTS planning district
ii. 47: External Zones
iii. 48: No usual place of work
- Future year total population – integer or floating point
i. Note: future population should be 0 for zones 4000 – 4409, and 8888,
since we are not attempting to synthesize any households whose
residence is outside of the GTA.
- Future year total employment – integer or floating point
i. Note: future employment should be estimated using a growth factor
for zones 4000 – 4409 and 8888. Currently a growth factor of 50% is
assumed based on the average GTA growth rate [**********Check
50%********]
4) Distribution of 1996 TTS daycare trips
Filename: fin_dayc.txt
Format: comma delimited
Field – data type:
- residence planning district - integer 1-46
- daycare destination zone – integer 1-4409
- number of expanded trips from residence planning district to daycare
destination zone – float 7.2
September 27, 2001
Page 6
Toronto CO2 Emissions Project
Department of Civil Engineering
Population Synthesis
Documentation - Draft
-
4. Population Synthesis Program Output File Formats
The population synthesis program generates the following output files:
1) Updated Person Data
2) Updated Household Data
3) Updated Zone Data
4) Planning District Totals
5) GTA Totals
The output is in the following format:
1) Updated Person Data
Filename: fout_per.txt
Format: comma delimited
Field – data type:
- 1st 17 fields – same as TTS Person input file (see Section 3)
- Future employment zone – integer, 0-8888
- Future school zone – integer, 0-4409
- Day care status – single digit character (‘D’ = Out-of-home child care)
- Day care zone – integer 0-4409
2) Updated Household Data
Filename: fout_hhl.txt
Format: comma delimited
Field – data type:
- 1st 8 fields – same as TTS Household Data input file (see Section 3)
- Future expansion factor – floating point 2.2
3) Updated Zone Data
Filename: fout_zon.txt
Format: comma delimited
Field – data type:
- Zone number – integer 1 – 8888
- Planning District ID – integer, 0 - 47
- Number of Person Records – integer, 1 – 6 digits
- 1996 Population total
- integer, 1 - 6 digits
- 1996 Workers
gen office/clerical
- integer, 1 - 6 digits
manufacturing / const - integer, 1 - 6 digits
professional/mgmt
- integer, 1 - 6 digits
retail/sales/service
- integer, 1 - 6 digits
unknown occupation - integer, 1 - 6 digits
- 1996 Employment Total
- integer, 1 - 6 digits
gen office/clerical
- integer, 1 - 6 digits
manufacturing / const - integer, 1 - 6 digits
professional/mgmt
- integer, 1 - 6 digits
September 27, 2001
Page 7
Toronto CO2 Emissions Project
Department of Civil Engineering
-
-
-
Population Synthesis
Documentation - Draft
retail/sales/service
- integer, 1 - 6 digits
unknown occupation - integer, 1 - 6 digits
Future Population Total
- integer, 1 - 6 digits
gen office/clerical
- integer, 1 - 6 digits
manufacturing / const - integer, 1 - 6 digits
professional/mgmt
- integer, 1 - 6 digits
retail/sales/service
- integer, 1 - 6 digits
unknown occupation - integer, 1 - 6 digits
Future Raw Employment Total (figures provided by City of Toronto)
- integer, 1 - 6 digits
Future Adjusted Employment Total (adjusted target total consistent with
worker totals)
- integer, 1 - 6 digits
gen office/clerical
- integer, 1 - 6 digits
manufacturing / const - integer, 1 - 6 digits
professional/mgmt
- integer, 1 - 6 digits
retail/sales/service
- integer, 1 - 6 digits
unknown occupation - integer, 1 - 6 digits
Future Simulated Employment Total (simulated, which are slightly different
than “adjusted” employment, since pow is chosen randomly with replacement)
- integer, 1 - 6 digits
gen office/clerical
- integer, 1 - 6 digits
manufacturing / const - integer, 1 - 6 digits
professional/mgmt
- integer, 1 - 6 digits
retail/sales/service
- integer, 1 - 6 digits
unknown occupation - integer, 1 - 6 digits
5) Planning District Totals
Filename: fout_pd.txt
Format: comma delimited
Field – data type:
- Planning District ID – integer, 0-47
- Planning District Number – integer, 1-48
- Fields 3 – 33 – Same as Updated Zone Data output file fields 4 – 34
6) GTA totals
Filename: fout_gta.txt
Format: see example
Field – data type: explained in output file
Example of GTA total output file
GTA Statistics
No. of household records (original) :88898
No. of household records (with seeds):120882
No. of person records (original)
:243286
September 27, 2001
Page 8
Toronto CO2 Emissions Project
Department of Civil Engineering
Population Synthesis
Documentation - Draft
No. of person records (with seeds)
:337925
1996
1996
1996
1996
1996
1996
1996
gta
gta
gta
gta
gta
gta
gta
pop
workers
gen office clerical work
manufacturing const work
professional/mgmt tech work
retail sales service
work
unknown work
:4.92637e+006
:2.42169e+006
:332115
:528520
:984696
:566452
:9908.66
1996
1996
1996
1996
1996
1996
gta
gta
gta
gta
gta
gta
emp
gen office clerical emp
manufacturing const emp
professional/mgmt tech emp
retail sales service
emp
unknown emp
:2.42169e+006
:332115
:528520
:984696
:566452
:9908.66
Future
Future
Future
Future
Future
Future
Future
gta
gta
gta
gta
gta
gta
gta
pop
:7.22148e+006
work
:3.57777e+006
gen office clerical work :485669
manufacturing const work :776371
professional/mgmt tech wrk:1.47306e+006
retail sales service
wrk:828216
unknown work
:14459
Future gta raw employment (no adj)
Employment growth adjustment factor
Future
Future
Future
Future
Future
Future
gta
gta
gta
gta
gta
gta
:4.18166e+006
:0.656875
employment adjusted
:3.57777e+006
gen office clerical emp
:485669
manufacturing const emp
:776371
professional/mgmt tech emp:1.47306e+006
retail sales service
emp:828216
unknown emp
:14459
September 27, 2001
Page 9
Toronto CO2 Emissions Project
Department of Civil Engineering
Population Synthesis
Documentation - Draft
5. Structure of the Program
The population synthesis program consists of ten classes, each of which is briefly
described in this section. The full set of details can be found by reviewing the program
code shown in Appendix B.
CGta
-
Contains all GTA 1996 and future population, employment and worker totals
Contains simulation parameters, such as the “next household Id and “next
person Id”
This class is the main “control center” of the program; all major functions are
invoked in this class
There is one instantiation of this class (gta0) which is a global variable
CPlanningDistrict
- Contains planning district 1996 and future population, employment and
worker totals
- Contains three lists of POS objects, for post-secondary, high-school and
daycare locations, which define the POS and the number of people that live in
the residence planning district and attend school in the POS zone (for daycare
it includes the # of daycare trips to the POS zone
- Contains an array of household Ids whose residence is in the planning district
- Has a function to return a random household from the household ID array
- Other than that, it is basically a container of pd totals
- 48 planning districts are instantiated globally and stored in an array on the
heap space. There is a pointer to this array (pointer name: pd)
CZone
-
-
-
Contains all zonal 1996 and future population, employment and worker totals
Contains a set of lists of POW objects, one for each occupation type, which
define the POW and the number of people in the zone that work in the pow
zone
Contains a range of functions to add or return population/employment/worker
totals within the zone, to seed itself appropriately with randomly selected
households from the planning district, and to return a randomly selected POW
1703 zones are instantiated globally and stored as an array in the heap space.
There is a pointer to this array (pointer name: zone)
CHouseholdRecord
- Contains household attributes from the TTS input file
- Contains a list of person Ids in the household
- Contains 1996 and future expansion factors for the household
- Can copy itself to another household (for the seeding procedure) and can
calculate its future expansion factor by accessing zonal totals
- 88898 household records are instantiated globally and stored in an array in the
heap space. Space for an additional 35,000 household records is reserved in
September 27, 2001
Page 10
Toronto CO2 Emissions Project
Department of Civil Engineering
Population Synthesis
Documentation - Draft
the array to allow for seeding of zones. There is a pointer to the array
(household_record)
- The array index does not correspond to the TTS household number. See class
CHouseholdIndex for details
CPersonRecord
- Contains person attributes from the TTS person input file
- Contains 1996 and future employment and school zones
- Contains randomly generated day care status and day care location, where
appropriate
- Can copy itself to another person (for the seeding procedure)
- 243286 person records are instantiated globally and stored in an array in the
heap space. Space for an additional 100,000 person records is also reserved in
the array to allow for seeding of zones. There is a pointer to this array
(person_record)
CHouseholdIndex
- This is an index that returns a sequentially ordered Household Id, given the
TTS household number
- It allows for households to be stored in an array with the household ID as the
array index, thereby leaving no gaps in the household_record array and
preserving memory
CZoneIndex
- This is an index that returns a sequentially ordered Zone Id, given the GTA
zone number
- It allows for zones to be stored in an array with no gaps as for the households
CPow
-
-
-
A pow is a simple place-of-work object that contains two data variables, the
place of work zone and the number of people from place of residence zone X
that work there (for a particular occupation type).
Pow objects are stored in a list in the place of residence zone (i.e. the CZone
class). A list is an efficient storage container for this purpose, because zones
that are not a place of work for a particular place of residence need not be
included in the list for that por zone.
Pow objects are used in evaluating a randomly chosen place of work for each
person
CPos
-
-
A pos is a simple place-of-school object that contains 2 data variables, the
place-of-school zone and the number of people from planning district X that
attend school there (for a particular school level, i.e. high school versus post
secondary school)
Pos objects are stored in a list in the place of residence planning district (i.e.
the CPlanningDistrict) class. A list is an efficient storage container for this
September 27, 2001
Page 11
Toronto CO2 Emissions Project
Department of Civil Engineering
-
Population Synthesis
Documentation - Draft
purpose because zones that are not a pos for a particular por planning district
need not be included in the list for that por planning district.
Pos objects are used in evaluating a randomly chosen future place of school
for high school students and post secondary students
CPorpow
- Contains a full 1996 porpow matrix, 2 vectors that contain factors for frataring
the O-zone and the D-zone,
- Contains a planning district to zone porpow matrix, and a zone to planning
district matrix, to aid in seeding rows or columns where significant growth in
either workers or employment are expected to increase significantly for a
particular occupation type
- Contains occupation Id (i.e. each Porpow matrix is for a single occupation)
- Will fratar itself, seeding appropriately
- Will load the zonal pow lists based on the fratared matrix
CPorpos
-
-
Contains 2 place-of-residence place-of-school (porpos) matrices, one for high
school students (16-18 years old) and one for post-secondary students (>18
years old)
Por is defined by planning district and pos is defined by traffic zone.
Loads the planning district pos lists based on the data in the porpos matrices
CFileMgr
- Performs some simple file management functions, such as opening output
files
September 27, 2001
Page 12
Toronto CO2 Emissions Project
Department of Civil Engineering
Population Synthesis
Documentation - Draft
6. Outline of the Methodology
The program follows the following general steps:
1) Data for households, persons and zones are loaded into memory and indexes set up
2) 1996 population and employment are summed with a breakdown by occupation
3) Future expansion factors are calculated based on existing and future total population
by zone
a. Population growth factors for each zone are calculated based on existing and
future total population
b. Zones with population growth factor >2.0 are seeded with randomly chosen
households from the planning district, such that the future expansion factor is
no greater than 20.
c. Future expansion factors are calculated based on the seeded population
databank.
4) Future total zonal employment growth is adjusted such that total GTA employment is
consistent with total GTA workers.
5) The future total zonal employment by occupation is broken down into occupation
groups by frataring the 1996 employment occupation distribution to reflect future
total zonal employment and future total gtaGTA workers by occupation type
6) The 1996 place-of-residence place-of-work (POR POW) matrix for each occupation
type is generated from the TTS data
7) The future POR POW table is generated as follows:
a. Using the 1996 POR POW table as a seed matrix,
b. For those POR “rows” where the number of workers increases tenfold, the
planning district totals are used to seed that particular row.
c. For those POW “columns” where the employment increases tenfold, the
planning district totals are used to seed that particular column.
d. The fratar method is applied to each of the seeded occupational POR POW
tables such that row and column sums reflect future worker and employment
totals by occupation type.
8) The future place of work for each worker is generated as follows:
a. The distribution of workers of Occupation Type X and Place of Residence Y
that are expected to work in each Zone Z is calculated based on the future
occupational POR POW tables.
b. A future place of work is drawn (with replacement) from this probability
distribution and assigned to each worker based on his or her occupation and
place of residence.
9) The future place of school for each student is generated as follows:
a. The 1996 place-of-residence place-of-school (porpos) matrices are generated
for high school age (14-18 years old) and post-secondary school age (>18
years old) students based on 1996 TTS data
b. The distribution of students in age group X and place of residence Y that
attend school in each zone Z is calculated based on the 1996 porpos matrices
September 27, 2001
Page 13
Toronto CO2 Emissions Project
Department of Civil Engineering
Population Synthesis
Documentation - Draft
c. For elementary students and high school students, the future pos zone is
assumed to equal the 1996 pos zone, unless the student record is generated as
part of the seeding process (in Methodology step 3b)
d. For “seeded” elementary students (<14 years old), the future pos zone is
assumed to equal the por zone, on the assumption that most elementary
schools are located very close to the residence.
e. For “seeded” high school students (14 to 18 years old), a future pos school
zone is randomly drawn from the 1996 planning district distribution of high
school pos zones.
f. For post-secondary age students (>18 years old), a future pos zone is
randomly drawn from the 1996 planning district distribution of postsecondary age pos zones.
10) Day care status and location of children is generated as follows:
a. an input file is loaded that contains the TTS expanded day-care trips by
residence planning district and day-care trips destination zone
b. based on Statistics Canada data from the (******Insert study name*****),
day care status is assigned randomly to all children < 11 years old whose
household contains only adults that are working (part-time or full time) or
attending school
c. the place-of-day-care distribution for each planning district is applied to all
day-care children in that planning district to result in day-care location
11) The program outputs data to the output data files and clears all objects in memory.
September 27, 2001
Page 14
Toronto CO2 Emissions Project
Department of Civil Engineering
Population Synthesis
Documentation - Draft
7. Detailed Methodology
The program follows the following general steps:
1) Data for households, persons and zones are loaded into memory and indexes set
up
Input files are described in detail in Section 4. They are loaded into the heap space
memory. It is noted that the following simulation parameters are set to match the data in
the input files as described below:
NUMPERS – 243286 – the number of person records in the person input file
NUMHHLD – 88898 – the number of households in the household input file
NUMZONE – 1703 – the number of zones (includes 1677 GTA zones, 25 external zones
and zone 8888, which represents “no usual place of work”)
NUMPD – 48 – the number of planning districts (pd 47 includes all external zones, and
pd 48 is for zone 8888, “no usual place of work”)
HIHHLD – 268184 – the highest household number in the household input file
HIZONE – 8888 – the highest zone number in the zone input file
MAXPDHHLD – 8653 – the maximum number of household records expected for a
given planning district (for setting hhld list array size)
NUMPERSSEED – 100000 – the expected number of persons required for seeding (this
number may require adjustment if new population and employment forecasts are
obtained)
NUMHHLDSEED – 35000 – the expected number of households required for seeding
(this number may require adjustment if new population and employment forecasts are
obtained)
Additional data is added to each household record, person record, zone and planning
district as the data is loaded in. The household ID (to be distinguished from the
household number) is a sequential ID number that corresponds to the array index for the
household records (whereas the TTS household number has gaps). Similarly, a person ID
is created for each person record and a zone ID (to be distinguished from the 1996 zone
number) is created to each zone. The household ID and the zone ID are entered into
household index and zone index arrays, respectively, so that the household and zone Ids
can be easily looked up if the household or zone numbers are known.
Also part of the initialization procedure, a list of households is created for each planning
district and a list of persons is created in each household object. This basically concludes
the process of loading and setting up the initial data in the program.
2) 1996 population and employment are summed with a breakdown by occupation
1996 population and employment are summed by scrolling through the person list,
determining the person’s household (to obtain the expansion factor), adding the
expansion factor to the 96 population total for the zone of residence and adding the
September 27, 2001
Page 15
Toronto CO2 Emissions Project
Department of Civil Engineering
Population Synthesis
Documentation - Draft
expansion factor to the 96 employment total for the zone of employment. The 1996
numbers were checked against TTS totals using the DRS (the data retrieval system from
the Data Management Group), and were found to be consistent for population. For
employment, the DRS indicates that about 3.5% of people employed within the GTA
reside outside of the GTA, and are therefore about 3.5% of GTA employment is not
incorporated in the GTA population-based list generated for the scheduling program.
Workers and employment by occupation were also determined and entered into an array
of worker and employment totals as follows:
work1996[i] = 1996 workers of occupation type i.
emp1996[i]= 1996 employment of occupation type i,
where for i, 0 = general office / clerical
1 = manufacturing / construction / trades
2 = professional management / technical
3 = retail sales and service
4 = unknown
GTA total population and employment by occupation type are then calculated as the sum
of zonal population and employment.
3) Future expansion factors are calculated based on existing and future total
population by zone
- Population growth factors for each zone are calculated based on existing
and future total population
This calculation is done if the 1996 population > 0 and 1996 population > 0.5* the future
population. If there is no 1996 population and no future population, the population
growth factor is set to zero. If the population growth is > 2.0 then the factor is set to
999999 temporarily, since it must be recalculated after the zone is seeded, as discussed
below.
-
Zones with population growth factor >2.0 are seeded with randomly
chosen households from the planning district, such that the future
expansion factor is no greater than 20.
Households are chosen randomly from the household list (an array member variable of
the pd class). These households are then copied into new households in the zone that is
being seeded. The persons within this household are copied to a new set of identical
persons in the zone being seeded and are given the household number of the seeded
household.
-
Future expansion factors are calculated based on the seeded population
databank.
September 27, 2001
Page 16
Toronto CO2 Emissions Project
Department of Civil Engineering
Population Synthesis
Documentation - Draft
Future expansion factors are calculated by multiplying the 1996 expansion factor by the
zonal population growth factor. If the zone was seeded, the population growth factor is
set to 999999, in such cases the total number of person records (including seed records)
in that zone are calculated, and the future population is divided by that number.
4) Future total zonal employment growth is adjusted such that total GTA
employment is consistent with total GTA workers.
Future population and employment and workers by occupation type are calculated for
each zone and for the entire GTA. It is noted that the total employment for the GTA that
is calculated in this way (i.e. by summing the expansions factors based on the population
growth factor) is significantly lower than the total GTA population provided by the City
of Toronto. There are at least three reasons for this. First, the estimate by summing the
expansion factors does not include population that commutes from outside of the GTA.
Second, only one job is included for each worker, whereas in reality, some workers have
two or more jobs. Third, employment forecasts often tend to be over-optimistic
compared to population forecasts.
Since the purpose of the population synthesis exercise is to generate the “usual” place of
work for each person in the GTA, the total GTA employment estimate that is most
appropriate is the summation of GTA workers. Growth in all future zonal employment
estimates is therefore reduced such that the estimates of total GTA employment are
consistent.
5) The future total zonal employment by occupation is broken down into
occupation groups by frataring the 1996 employment occupation distribution to
reflect future total zonal employment and future total GTA workers by
occupation type
To maintain the maximum possible consistency within the database, it is necessary to
break down the zonal employment into occupation types bearing in mind the 1996
distribution of occupation types. The fratar method of biproportional updating is a
straightforward method to achieve this end. A full description of the fratar method can be
found in the EMME/2 manual.
It is necessary to seed the matrix because in some zones no employment exists in 1996,
but employment is expected to occur there by the future analysis year. In such zones,
there is no initial distribution to modify, so the zone must be seeded. Even if a small
employment base (say 100 jobs) exists in 1996 but employment is expected to increase
significantly (say, to 2000) by the future year, the 1996 distribution probably would not
be sufficiently precise to adequately represent the new employment base. Therefore,
zones are seeded with the planning district distribution if there is no 1996 employment or
there is greater than a tenfold increase in employment to the future year.
6) The 1996 place-of-residence place-of-work (POR POW) matrix for each
occupation type is generated from the TTS data
September 27, 2001
Page 17
Toronto CO2 Emissions Project
Department of Civil Engineering
Population Synthesis
Documentation - Draft
Because the POR POW matrices are large (1703 x 1703 =2.9 million cells), only one
matrix (for one occupation type) is loaded into memory at a time and all probability
calculations are completed for that occupation type before the next occupation type is
considered.
The POR POW matrix for a particular occupation type is generated by iterating through
all workers and adding the expansion factor to the correct (por, pow) cell for all workers
of that occupation type. The por planning district to pow zone and por zone to pow
planning district matrices are also calculated at this stage for use in seeding of the POR
POW matrices in the fratar process. It is noted that POR POW matrices do not consist of
integers because expansion factors are not integers.
7) The future POR POW table is generated as follows:
- Using the 1996 POR POW table as a seed matrix,
- For those POR “rows” where the number of workers increases tenfold,
the planning district totals are used to seed that particular row.
- For those POW “columns” where the employment increases tenfold, the
planning district totals are used to seed that particular column.
- The fratar method is applied to each of the seeded occupational POR
POW tables such that row and column sums reflect future worker and
employment totals by occupation type.
The major problem encountered in this procedure is that the POR POW matrices are
relatively sparse, since there are so many cells in the matrix, and the workers are split
among several occupation types. The matrices would not converge properly for all
matrices, probably due to the problem of sparseness. If the rule of applying the planning
district distribution where there is a tenfold increase in either workers or employment
solves the problem for all but a small handful of zones if it is only applied at the outset of
the frataring process. For the small number of zones that still do not converge, the
planning district distribution is applied at the point in the frataring process where a
frataring parameter of > 10 is necessary (analogous to a tenfold increase in workers or
employment). By this process, the POR POW matrix converges for all occupation types.
8) The future place of work for each worker is generated as follows:
a. The distribution of workers of Occupation Type X and Place of Residence
Y that are expected to work in each Zone Z is calculated based on the
future occupational POR POW tables.
Only the pow zones that have a non-zero number of workers are stored in each place of
residence zone, in order to save memory. They are stored in a “list” container, which is
an efficient means of storing such data (see MSDN library in Visual C++ for details on
the list container).
September 27, 2001
Page 18
Toronto CO2 Emissions Project
Department of Civil Engineering
Population Synthesis
Documentation - Draft
b. A future place of work is drawn (with replacement) from this probability
distribution and assigned to each worker based on his/her occupation and
place of residence.
In order to attain consistency between the (adjusted) employment totals provided by the
City of Toronto, it would be necessary to sample without replacement, or in other words,
to reduce the probability associated with each pow each time it is actually chosen by the
random process. There are two problems implementing the “sampling without
replacement” process.
First, the POR POW matrices are not integer matrices and each time a pow is chosen the
person’s expansion factor must be subtracted from this cell. However, this may result in
negative numbers in the revised probability distribution, since the non-integer expansion
factors may be larger than the remaining value in each POR POW cell.
Second, as part of the seeding process the planning district totals were applied for a
number of por and pow zones. After the fratar process these rows and columns have a
wider distribution of zones, and therefore relatively a small number of workers in each
por/pow cell. This amplifies the first problem associated with sampling without
replacement as described above.
It is reasonable to use sampling with replacement and to accept the fairly small
inconsistencies given that the employment forecasts are quite approximate in the first
place.
9) The future place of school for each student is generated as follows:
a. The 1996 place-of-residence place-of-school (porpos) matrices are
generated for high school age (14-18 years old) and post-secondary school
age (>18 years old) students based on 1996 TTS data
The 1996 porpos matrices are generated with planning district as the por and TTS traffic
zone as the pos. Planning districts are used as the por because for high school and postsecondary school, travel is fairly regional in nature since such school institutions are
generally widely spaced. For zones that require seeding (in Methodology step 3b), it is
also unlikely that a sufficient number of school records would exist to provide a valid
distribution.
b. The distribution of students in age group X and place of residence Y that
attend school in each zone Z is calculated based on the 1996 porpos
matrices
Only the pos zones that have a non-zero number of students are stored in each place of
residence planning district, in order to save memory. They are stored in a “list”
container, which is an efficient means of storing such data (see MSDN library in Visual
C++ for details on the list container).
September 27, 2001
Page 19
Toronto CO2 Emissions Project
Department of Civil Engineering
Population Synthesis
Documentation - Draft
c. For elementary students and high school students, the future pos zone is
assumed to equal the 1996 pos zone, unless the student record is
generated as part of the seeding process (in Methodology step 3b)
This assumption implies that the school travel patterns are stable over time for mature
zones.
d. For “seeded” elementary students (<14 years old), the future pos zone is
assumed to equal the por zone
This assumption implies that most elementary schools are located very close to the
residence, and that for areas high in population growth, new elementary schools will also
be built to support the communities.
e. For “seeded” high school students (14 to 18 years old), a future pos school
zone is randomly drawn from the 1996 planning district distribution of
high school pos zones.
This assumption implies that high school travel is more regional in nature. Since the
existing distribution of high school destinations is used for areas high in population
growth it is, in effect, assumed that new high schools are not being built for these areas.
This simplifying assumption is made since we do not know future high school locations.
f. For post-secondary age students (>18 years old), a future pos zone is
randomly drawn from the 1996 planning district distribution of postsecondary age pos zones.
This assumption implies that no new post-secondary institutions will be constructed in
new locations in the GTA. The increase in post-secondary students is assumed to be
accommodated in existing (although perhaps expanded) institutions.
10) Day care status and location of children is generated as follows:
a. an input file is loaded that contains the TTS expanded day-care trips by
residence planning district and day-care trips destination zone
b. based on Statistics Canada data from the (******Insert study
name*****), day care status is assigned randomly to all children < 11
years old whose household contains only adults that are working (parttime or full time) or attending school
Day care status is applied using this methodology to both the 1996 and future year data.
1996 TTS are considered to be sufficient to obtain a distribution, but are thought to be
significantly underreported, based on a comparison to Statistics Canada survey data
c. the place-of-day-care distribution for each planning district is applied to
all day-care children in that planning district to result in day-care
location
September 27, 2001
Page 20
Toronto CO2 Emissions Project
Department of Civil Engineering
Population Synthesis
Documentation - Draft
11) The program outputs data to the output data files and clears all objects in
memory.
Output file formats are given in Section 4.
September 27, 2001
Page 21
Toronto CO2 Emissions Project
Department of Civil Engineering
Population Synthesis
Documentation - Draft
APPENDIX A
TTS DATA GUIDE 2.0 – DATA DEFINITIONS
September 27, 2001
Page 22
Toronto CO2 Emissions Project
Department of Civil Engineering
Population Synthesis
Documentation - Draft
APPENDIX B
POPULATION SYNTHESIS CODE
September 27, 2001
Page 23
Download