Word file - University of Central Florida

advertisement
Latent Segmentation Based Count Models: Analysis of Bicycle Safety in Montreal and
Toronto
Shamsunnahar Yasmin
Doctoral Student
Department of Civil Engineering & Applied Mechanics
McGill University
Suite 483, 817 Sherbrooke St. W.
Montréal, Québec, H3A 2K6
Canada
Ph: 514 398 6823, Fax: 514 398 7361
Email: shamsunnahar.yasmin@mail.mcgill.ca
Naveen Eluru*
Assistant Professor
Department of Civil Engineering & Applied Mechanics
McGill University
Suite 483, 817 Sherbrooke St. W.
Montréal, Québec, H3A 2K6
Canada
Ph: 514 398 6823, Fax: 514 398 7361
Email: naveen.eluru@mcgill.ca
Abstract
The study contributes to literature on bicycle safety by exploring the influence of various built
environment measures at the Traffic Analysis Zone (TAZ) level on bicycle collision. In
conventional count models, the impact of exogenous factors is restricted to be the same across the
entire region. However, it is possible that the influence of exogenous factors might vary across
different TAZs. To accommodate for the potential variation in the impact of exogenous factors
we formulate latent segmentation based count models. Specifically, we formulate and estimate a
whole suite of latent segmentation based count models from single state (Poisson, Negative
Binomial) and dual state systems (Zero Inflated and Hurdle). The study investigates to what
extent it is possible to combine the features of latent segmentation and dual-state models in the
context of crash frequency analysis. The formulated models are estimated using bicycle-motor
vehicle crash data from the Island of Montreal and City of Toronto for the years 2006 through
2010. The TAZ level variables considered in our analysis include accessibility measures,
exposure measures, socio-demographic characteristics, socio-economic characteristics, road
network & traffic characteristics and built environment. This macro-level research would assist
decision makers, transportation officials and community planners to make informed decisions to
proactively improve bicycle safety – a prerequisite to promoting a culture of active
transportation.
INTRODUCTION
Background
Active forms of transportation such as walking and bicycling have the lowest carbon footprint on
the environment and improve the physical health of pedestrians and bicyclists. With growing
concern of worsening global climate change and increasing obesity among adults in developed
countries, it is hardly surprising that transportation decision makers are proactively encouraging
the adoption of active forms of transportation for short distance trips. For instance, bicycling, as a
transport mode, is experiencing increased patronage and support in most Canadian cities where
personal vehicles are an indispensable household commodity. In fact, between 1996 and 2006, a
42% increase in the number of daily bike commuters were observed in Canada (Pucher et al.,
2011) and Canadian bike share of work trips is almost three times higher than the corresponding
number in American metropolitan areas (Pucher et al., 2006).
However, transportation safety concerns related to active transportation users form one of
the biggest impediments to their adoption as a preferred alternative to private vehicle use for
shorter trips. Earlier research reveals that the likelihood of being involved in a collision increases
as the number of cyclists on the road increases (Wei and Lovegrove, 2013), and the risk of being
injured in a collision while cycling could be about seven times higher than a motorist (Reynolds
et al., 2009). Thus, traffic crashes and the consequent injury and fatality remains a detriment for
cycling, leading to low bicycle mode share, specifically in North American communities (Wei
and Lovegrove, 2013). Any effort to reduce the social burden of these crashes and encourage
people to use bicycling for their daily short trips would necessitate the implementation of policies
that enhance safety for bicyclists. An important tool to identify the critical factors affecting
occurrence of bicycle crashes is the application of planning level crash prediction models. The
current research effort contributes to literature on promoting active forms of transportation bicycling in particular - by identifying the important determinants of bicycle crash risk by
exploring the influence of various built environment measures on bicycle collisions at the Traffic
Analysis Zone (TAZ) level for the cities of Montreal and Toronto. This macro-level research
would assist decision makers, transportation officials and community planners to make informed
decisions to proactively improve bicycle safety – a prerequisite to promoting a culture of active
transportation.
Statistical Methods
Traffic crashes aggregated at a certain spatial scale are non-negative integer valued random
events occurring for any given time interval. Naturally, these integer counts are examined
employing count regression approaches. In statistics and transportation safety literature, several
count modeling techniques have been proposed and applied. We discuss the most commonly
employed approaches in transportation safety literature while highlighting the inherent
assumptions that affect the model development process.
The most often employed count regression model is the traditional Poisson regression
model as it can accommodate for the integer properties of count data directly (Hausman et al.,
1984). The Poisson model is easy to estimate and interpret thus becoming the workhorse for
count model applications. However, Poisson count regression is associated with the implicit
assumption that the mean and variance of the distribution under investigation is the same. In
crash frequency data, the variance of the crash count variable usually exceeds the mean of the
crash count variable (Hauer et al., 2001). In applied econometrics, the situation is described as
over-dispersion (McCullagh and Nelder, 1989), and may arise from unobserved heterogeneity
and/or temporal dependency (Chin and Quddus, 2003). Estimating a Poisson count model in the
presence of such over-dispersion, may result in incorrect and biased parameter estimates (Agresti,
1996; Cameron and Trivedi, 1998). Several approaches have been suggested to accommodate for
the presence of over-dispersion such as Poisson-Lognormal regression, Poisson-Weibull,
Negative Binomial and Gneralized Waring models (Aguero-Valverde and Jovanis, 2008; Lord
and Miranda-Moreno, 2008; Miaou et al., 2003; Maher and Mountain, 2009; Cheng et al., 2012;
Irwin, 1968; Peng et al., 2014; Moeinaddini et al., 2014). Of these approaches, the negative
binomial (NB) model that has a built-in dispersion parameter is widely employed in safety
literature. The NB model provides a natural enhancement over the Poisson model and is easy to
estimate with a closed form structure to accommodate for unobserved heterogeneity.
Another methodological challenge often faced in analyzing count variables is the presence
of a large number of zeroes. The classical count models (such as Poisson and NB) allocate a
probability to observe zero counts, which is often insufficient to account for the preponderance of
zeroes in a count data distribution. The condition of excess zeroes, also referred to as zeroinflation, arises in a setting of excess zero than would be expected under the Poisson and NB
distribution. In crash count variable models, the presence of excess zeroes may result from two
underlying processes or states of crash frequency likelihoods: crash-free state (or zero crash state)
and crash state (see Shankar et al., 1997 for more explanation). The zero crash state can be a
mixture of true zeroes (where the zones are inherently safe (Shankar et al., 1997)) and sampling
zeroes (where excess zeroes are results of potential underreporting of crash data (Miaou, 1994)).
In presence of such dual-state, application of single-state model (Poisson and NB) may result in
biased and inconsistent parameter estimates (Jang, 2005).
In econometric literature, two potential relaxations of the single-state count models are
proposed for addressing the issue of excess zeroes. The first approach – the Zero Inflated (ZI)
model - is typically used for accommodating the effect of both true and sampling zeroes, and has
been employed in several transportation safety studies (Shankar et al., 1997; Chin and Quddus,
2003). The second approach - the Hurdle model - is typically used in the presence of sampling
zeroes and has seldom been used in transportation safety literature. The two approaches differ in
the approach employed to address the excess zeroes. The appropriate framework for analysis
might depend on the actual empirical dataset under consideration.
The aforementioned count models (single or dual-state) typically restrict the impact of
exogenous variables to be the same across the entire population of crash events – homogeneity
assumption. But, the impact of control variables on bicycle crash frequency might vary across
TAZ based on different attributes. To account for systematic heterogeneity, researchers have
employed a clustering technique (Karlaftis et al., 1998). In this approach, target groups are
divided in different clusters based on a multivariate set of factors and separate models are
estimated for each cluster. However, the approach requires allocating data records exclusively to
a particular cluster, and does not consider the possible effects of unobserved factors that may
moderate the impact of observed exogenous variables. Additionally, this approach might result in
very few records in some clusters resulting in loss of estimation efficiency. An alternative
approach to accommodate for population heterogeneity is to employ random parameter count
models (Ukkusuri et al., 2011). However, in this approach the focus is on incorporating
unobserved heterogeneity through the error term which necessitates extensive amount of
simulation for model estimation.
A possible work around to accommodate for population heterogeneity is the application
of latent segmentation based approach (or sometimes also referred to as finite mixture model)
(Eluru et al., 2012, Yasmin et al., 2014). In this approach TAZs are allocated probabilistically to
different segments and a segment specific model is estimated for each segment. Such an
endogenous segmentation scheme is appealing for many reasons: First, each segment is allowed
to be identified with a multivariate set of exogenous variables, while also limiting the total
number of segments to a number that is much lower than what would be implied by a full
combinatorial scheme of the multivariate set of exogenous variables, Second, the probabilistic
assignment of TAZ to segments explicitly acknowledges the role played by unobserved factors in
moderating the impact of observed exogenous variables. Finally, this approach is semi-parametric
and hence, there is no need to specify a distributional assumption for the coefficients as is
required in random parameter models (Greene and Hensher, 2003).
To that extent, the current research effort contributes to the safety literature
methodologically and empirically by building count regression based latent segmentation based
models. Specifically, we formulate and estimate a whole suite of latent segmentation based count
models from single-state (Poisson, NB) and dual-state systems (ZI and Hurdle). Moreover,
drawing on the latent segmentation based model, we investigate to what extent it is possible to
combine the features of latent segmentation and dual-state models in the context of crash
frequency analysis. The entire set of models considered in our analysis include: Poisson,
Negative Binomial (NB), Hurdle Poisson (HP), Hurdle Negative Binomial (HNB), Zero-inflated
Poisson (ZIP), Zero-inflated Negative Binomial (ZINB), Latent Segmentation based Poisson
(LP), Latent Segmentation based Negative Binomial (LNB), Latent Segmentation based Hurdle
Poisson (LHP), Latent Segmentation based Hurdle Negative Binomial (LHNB), Latent
Segmentation based Zero-inflated Poisson (LZIP) and Latent Segmentation based Zero-inflated
Negative Binomial (LZINB) model. The proposed model frameworks are empirically estimated
using bicycle crash count data for the city of Montreal and Toronto at the Traffic Analysis Zone
(TAZ) level employing a comprehensive set of exogenous variables. To the best of our
knowledge, the modeling frameworks considered here represent the first time some of these
frameworks are employed in safety literature (HP, HNB, LHP, LHNB, LZIP, LZINB) and the
first time such an extensive set of count models have been estimated on the same datasets.
The rest of the paper is organized as follows. Section 2 provides a discussion of earlier
research on bicycle crash frequency modeling while positioning the current study. Section 3
provides details of the various econometric model frameworks used in the analysis. In Section 4,
the data are described. The model estimation results measures are presented in Section 5. Section
6 concludes the paper.
EARLIER RESEARCH AND CURRENT STUDY IN CONTEXT
A number of research efforts have examined bicycle crash frequency at the micro- and the macrolevel to gain a comprehensive understanding of the factors that affect bicycle crash risk.
Considerable research has been done to develop micro-level bicycle crash frequency model
(Brude and Larsson, 1993; Turner and Francis, 2003; Loo and Sui, 2010; Carter and Council,
2007). However, in the current study context, we focus on studies examining crashes at the
planning/macro level ο€­ the focus of our current study ο€­ for our review of earlier research. A
summary of earlier studies from the perspective of the various spatial levels considered in macrolevel studies is presented in Table 1. The information provided in the table includes
methodological approach employed, spatial aggregation level considered and variable categories
considered in the analysis from the six variable categories - accessibility measures, exposure
measures, socio-demographic characteristics, socio-economic characteristics, road network and
traffic characteristics and built environment. The following observations may be made from the
table. First, the most prevalent spatial unit considered at the macro-level analysis is Traffic
Analysis Zone (TAZ). Second, negative binomial model is the most frequently used statistical
technique1 for examining crashes at the aggregate level. Third, very few studies (5 out of 33)
explored bicycle crash frequency at the planning level. Fourth, none of the studies have either
explored dual-state of crash count events or employed latent segmentation based approach in
examining crash frequency at macro-level.
The overall findings from earlier research efforts are usually consistent. The most
common factors that increase crash frequency with the increase in the explanatory variables
include: population density, employment density, total dwelling units, number of intersections,
road mileage, unemployment rate, proportion of uneducated population, degree of urbanization,
public transit commuters, walking commuters and number of signalized intersections. On the
other hand, factors that reduce the crash frequency with the increase in the explanatory variable
include: mean household income, average commuter time, older population, full time worker and
volume to capacity ratio.
The overview of earlier literature indicates that, in recent years, examining crash
frequency at the macro-level has seen a revival of interest among the safety researchers.
However, there is a paucity of research focusing on macro-level bicycle crashes. Therefore, it is
important to examine bicycle crash at the macro-level to evaluate the safety level and forecast
safety of cyclists at the zonal planning level, which would facilitate proactive safety-conscious
planning. A critical component in the process of identifying the contributing risk factors is the
application of appropriate econometric model. As indicated in the earlier section, the most
prevalent formulation to study crash frequency is the NB model. However, NB model does not
account for the issue of zero inflation as well as population homogeneity assumptions discussed
above. Hence, in our analysis, we focus on developing modeling approaches that address these
challenges. The latent segmentation based approach accommodates for population heterogeneity
and allows for improved accuracy in quantifying the impact of exogenous variables on bicycle
crash counts. The approach has been employed recently in the safety literature for examining
traffic crash count events at micro-level (Park et al., 2010; Park et al., 2009; Zou et al., 2014).
However, the role of such population homogeneity in the context of macro-level crash count
model has not been investigated in the existing literature. It is also important to note that in
earlier micro-level studies only two segment models with a very small number of parameters (in
the segmentation and segment specific models) were estimated citing model estimation
complexity challenges. In our analysis, we enhance the existing latent segmentation approaches
(such as latent Poison and NB) by estimating a very rich parameter specification while at the
same time proposing and estimating newer latent segmentation models (LHP, LHNB, LZIP,
LZINB).
In summary, the current study contributes to literature on crash frequency in general and
bicycle safety in particular in multiple ways. First, the study formulates and estimates an
exhaustive set of latent segmentation based count models that accommodates both observed and
unobserved heterogeneity. The modeling approaches encompass both the single-state models and
1
Lord and Mannering, (2010) present the extensive review of statistical techniques used in transportation safety for
crash frequency analysis
dual-state models such as Poisson, NB, HP, HNB, ZIP, ZINB, LP, LNB, LHP, LHNB, LZIP and
LZINB model. Second, we investigate the advantages of drawing on the features of latent
segmentation and dual-state models (Hurdle and ZI) in the context of crash frequency analysis.
Third, the whole suite of count models are estimated using bicycle crash data from Montreal and
Toronto at the Traffic Analysis Zone (TAZ) level employing a comprehensive set of exogenous
variables. We assess the fit of the estimated bicycle crash frequency models by employing data fit
metrics. Finally, based on the model results we identify important exogenous variables that
influence bicycle crash counts.
ECONOMETRIC FRAMEWORK
In the latent segmentation based approach, bicycle crash count records for TAZs are
probabilistically assigned to 𝑠 relatively homogenous (but latent to the analyst) segments based
on different explanatory variables. Within each segment, the effects of exogenous variables on
the number of crashes occurring across the TAZ over a given period of time are fixed in the
segment. Hence, the latent segmentation based model consists of two components: (1)
assignment component and (2) segment specific count model component. The general structure
for all latent segmentation count models involves specifying these two components. For the ease
of presentation, we describe the general mathematical structure first and then identify the
different modeling structures for various models in the subsequent discussion.
Let us assume that 𝑠 be the index for segments (𝑠 = 1, 2,3, … , 𝑆), 𝑖 be the index for TAZ
(𝑖 = 1,2,3, … , 𝑁) and 𝑦𝑖 be the index for crashes occurring over a period of time in a TAZ 𝑖. The
assignments of TAZ to different segments are modeled as a function of a column vector of
exogenous variable by using the random utility based multinomial logit model (see Wedel et al.,
1993; Bago d'Uva, 2006, Eluru et al., 2012, Yasmin et al., 2014, for similar formulation) as:
𝑃𝑖𝑠 =
𝑒π‘₯𝑝[𝛽𝑠 𝒙𝑠 ]
𝑆
∑𝑠=1 𝑒π‘₯𝑝[𝛽𝑠 𝒙𝑠 ]
(1)
where, 𝒙𝑠 is a vector of attributes and 𝛽𝑠 is a conformable parameter vector to be estimated. The
assignment process is the same for all latent class models.
Within any latent segmentation approach, the unconditional probability of 𝑦𝑖 can be given
as:
𝑆
𝑃𝑖 (𝑦𝑖 ) = ∑(𝑃𝑖 (𝑦𝑖 )|𝑠) × (𝑃𝑖𝑠 )
(2)
𝑠=1
where 𝑃𝑖 (𝑦𝑖 )|𝑠 corresponds to the probability of count 𝑦𝑖 in segment s. The exact probability
function for 𝑃𝑖 (𝑦𝑖 ) depends on the count model chosen for the segment specific model. The exact
mathematical details for all the count model frameworks considered in our analysis are presented.
The probability distribution for Poisson is given by:
𝑒 −πœ‡π‘–π‘  (πœ‡π‘–π‘  )𝑦𝑖
𝑃𝑖𝑠 (𝑦𝑖 |πœ‡π‘–π‘  , 𝑠) =
, πœ‡π‘–π‘  > 0
𝑦𝑖 !
(3)
where πœ‡π‘–π‘  is the expected number of crashes occurring in TAZ 𝑖 over a given period of time in
segment 𝑠. We can express πœ‡π‘–π‘  as a function of explanatory variable (𝒛𝑖 ) by using a log-link
function as: πœ‡π‘–π‘  = 𝐸(𝑦𝑖 |𝒛𝑖 ) = 𝑒π‘₯𝑝(𝛿𝑠 𝒛𝑖 ), where 𝛿𝑠 is a vector of parameters to be estimated
specific to segment 𝑠. However, one of the most restrictive assumptions of Poisson regression,
often being violated, is that the conditional mean is equal to the conditional variance of the
dependent variable.
The variance assumption of Poisson regression is relaxed in NB by adding a Gamma
distributed disturbance term to Poisson distributed count data (Jang, 2005). Given the above
setup, the NB probability expression for 𝑦𝑖 conditional on belonging to segment 𝑠 can be written
as:
1
𝑦𝑖
𝛼𝑠
Γ(𝑦𝑖 +𝛼𝑠−1 )
1
1
𝑃𝑖𝑠 (𝑦𝑖 |, πœ‡π‘–π‘ , 𝛼𝑠 , 𝑠) =
(
)
(1
−
)
Γ(𝑦𝑖 + 1)Γ(𝛼𝑠−1 ) 1 + 𝛼𝑠 πœ‡π‘–π‘ 
1 + 𝛼𝑠 πœ‡π‘–π‘ 
(4)
where, Γ(βˆ™) is the Gamma function and 𝛼𝑠 is the NB dispersion parameter specific to segment 𝑠.
The Poisson and NB models do not account for the over-representation of zero
observations in the data. Hurdle and Zero Inflated (ZI) models are typically used in the presence
of such excess zeroes. Cameron and Trivedi (1998) presented these models as finite mixture
models with a degenerate distribution and probability mass concentrated at zeroes. Between these
two approaches, Hurdle approach is generally used for modeling excess sampling zeroes. It is
usually interpreted as a two part model (Heilbron, 1994): the first part is a binary response
structure modeling the probability of crossing the hurdle of zeroes for the response and the
second part is a zero-truncated formulation modeled in the form of standard count models
(Poisson or NB). Thus the probability expression for Hurdle model, conditional on belonging to
segment 𝑠, can be expressed as:
πœ‹π‘–π‘ 
𝛬𝑖𝑠 [𝑦𝑖 |𝑠] = {
(1−πœ‹π‘–π‘  )
𝑖𝑠
(1−𝑒 −πœ‡ )
𝑦𝑖 = 0
𝑃𝑖𝑠 (𝑦𝑖 |𝑠)
𝑦𝑖 > 0
(5)
where, πœ‹π‘–π‘  is the probability of count zero specific to segment 𝑠 and is modeled as a binary logit
model as follows:
πœ‹π‘  =
𝑒π‘₯𝑝(𝛾𝑠 πœΌπ‘–π‘  )
1 + 𝑒π‘₯𝑝(𝛾𝑠 πœΌπ‘–π‘  )
(6)
where, πœΌπ‘–π‘  is a vector of attributes and 𝛾𝑠 is a conformable parameter vector to be estimated.
Substitution of 𝑃𝑖𝑠 (𝑦𝑖 |𝑠) of equations 3 and 4 in equation 6 will result in Hurdle Poisson (HP)
and Hurdle NB (HNB) models, respectively.
Unlike the Hurdle model, zero-inflated (ZI) model is argued to account for both the
structural and sampling zeroes (Min and Agresti, 2005). The probability of observing a zero is
modeled in the ZI model by using a mixing distribution, and, therefore, this model is often
referred to as mixture model (Welsh et al., 1996). The first part of this mixture specification
addresses the zero-inflation and the second part addresses the unobserved heterogeneity of events
including zero (Jang, 2005). Therefore, ZI model assigns more weight on the probability of
observing a zero than Hurdle model by using the mixing distribution. Given the above setting, the
probability expression of ZI model for 𝑦𝑖 conditional on belonging to 𝑠 can be expressed as:
πœ”π‘–π‘  + (1 − πœ”π‘–π‘  )𝑒π‘₯𝑝(−πœ‡π‘–π‘  )
𝛬𝑖𝑠 [𝑦𝑖 |𝑠] = {
𝑦𝑖 = 0
(7)
(1 − πœ”π‘–π‘  )𝑃𝑖𝑠 (𝑦𝑖 |𝑠)
𝑦𝑖 > 0
where, πœ”π‘–π‘  can also be modeled as a binary logit model as in equation 6. Substitution of 𝑃𝑖𝑠 (𝑦𝑖 |𝑠)
of equations 3 and 4 in equation 7 will result in ZI Poisson (ZIP) and ZI NB (ZINB) models,
respectively.
Finally, the log-likelihood function for the latent segmentation (LS) based count model
can be written as:
𝑁
𝑆
𝐿𝐿 = ∑ π‘™π‘œπ‘” (∑(𝑃𝑖 (𝑦𝑖 )|𝑠) × (𝑃𝑖𝑠 ))
𝑖=1
(8)
𝑠=1
Substitution of (𝑃𝑖 (𝑦𝑖 )|𝑠) by equations 3,4,5 and 7 into equation 8 results in latent
segmentation based Poisson (LP), latent segmentation based NB (LNB), latent segmentation
based Hurdle (LH) and latent segmentation based ZI (LZI) models, respectively. The parameters
to be estimated in the model of equation 8 are: 𝛽𝑠 and 𝑆 for each Latent segmentation based
model along with 𝛿𝑠 for LSP; 𝛼𝑠 and 𝛿𝑖𝑠 for LSNB; 𝛾𝑠 (specific to πœ‹π‘–π‘  ) and 𝛿𝑖𝑠 for LHP; 𝛾𝑠
(specific to πœ‹π‘–π‘  ), 𝛼𝑠 and 𝛿𝑖𝑠 for LHNB; 𝛾𝑠 (specific to πœ”π‘–π‘  ) and 𝛿𝑖𝑠 for LZIP; and 𝛾𝑠 (specific to
πœ”π‘–π‘  ), 𝛼𝑠 and 𝛿𝑖𝑠 for LZINB model. The parameters are estimated using maximum likelihood
approaches. The model estimation is achieved through the log-likelihood functions programmed
in Gauss. In the application of these models, determining the appropriate number of segments is a
critical issue with respect to interpretation and inferences. Therefore, we estimate these models
with increasing numbers of segments (S=2,3,4,…) until an addition of a segment does not add
value to the model in terms of data fit.
DATA DESCRIPTION
Our study areas include ο€­ the Island of Montreal associated with 837 TAZs with a population of
about 1.8 million and covers an area of approximately 499 km2 and the City of Toronto
associated with 672 TAZs with a population of about 2.6 million and covers an area of
approximately 630 km2 (Statistics Canada, 2011). Data for our empirical analysis are sourced
from these two cities for the year 2006 through 2010. This study is focused on bicycle-motor
vehicle crash data, aggregated at the level of traffic analysis zone (TAZ) for each year. For the
five years, the databases have records of 3,066 bicycle crashes in Montreal and 5,475 bicycle
crashes in Toronto. The explanatory attributes considered in the empirical study are also
aggregated at the TAZ level accordingly. For the empirical analysis, we selected variables that
can be grouped into six broad categories: accessibility measures, exposure measures, sociodemographic characteristics, socio-economic characteristics, road network & traffic
characteristics and built environment. Table 2 offers a summary of the sample characteristics of
the exogenous factors in the estimation dataset. To conserve on space, we presented the sample
characteristics only for the Montreal Island. Table 2 represents the definition of variables
considered for model estimation along with the zonal minimum, maximum and average values
for Montreal.
EMPIRICAL ANALYSIS
Model specification and overall measures of fit
The empirical analysis of bicycle crash frequency involves the estimation of twelve models: (1)
Poisson, (2) Negative Binomial (NB), (3) Hurdle Poisson (HP), (4) Hurdle Negative Binomial
(HNB), (5) Zero-inflated Poisson (ZIP), (6) Zero-inflated Negative Binomial (ZINB), (7) Latent
Segmentation based Poisson (LP), (8) Latent Segmentation based Negative Binomial (LNB), (9)
Latent Segmentation based Hurdle Poisson (LHP), (10) Latent Segmentation based Hurdle
Negative Binomial (LHNB), (11) Latent Segmentation based Zero-inflated Poisson (LZIP) and
(12) Latent Segmentation based Zero-inflated Negative Binomial (LZINB) model. Prior to
discussing the estimation results, we compare the performance of these models in this section.
The model comparisons are undertaken in three stages. First, we compare the performance of
single-state and dual-state models to determine the potential presence of over-dispersion and
zero-inflation in the crash count data. Second, we determine the appropriate number of latent
classes for the estimated latent segmentation based count models. Third, we compare the
unsegmented models (obtained from the first step) with the more general latent segmentation
based count models (obtained from the second step) in order to assess the importance of
accounting for heterogeneity in estimating zonal level crash frequency models. For these
comparison exercises we compute different Information Criterion (IC) - Akaike and Bayesian and these measures along with log-likelihood at convergence for all the estimated models are
presented in Table 3. The reader would note that the log-likelihood function for the latent
segmentation models is quite flat around the maximum and estimation of complex model
structures such LHP, LHNB, LZIP, LZINB are far from straight-forward (see Sobhani et al.2013,
for a discussion). Hence, all theoretically estimable models need not be empirically estimable.
Hence, we estimated all the latent segmentation model frameworks judiciously.
Determining the Appropriate State of the Count Event
To explore for the presence of over-dispersion and zero-inflation in the aggregated crash count
events, we compare the performance of estimated single-state and dual-state models. Likelihoodratio test is used to compare the pairs of full and nested models (NB vs. Poisson; ZIP vs. Poisson;
ZINB vs. NB; ZINB vs. ZIP and HNB vs. HP). The LR test statistic is computed as 2[πΏπΏπ‘ˆ −
𝐿𝐿𝑅 ], where πΏπΏπ‘ˆ and 𝐿𝐿𝑅 are the log-likelihood of the full and the nested models, respectively.
The LR test values for Montreal and Toronto indicate that ZINB and HNB models are superior
among other models. Further, for non-nested models (ZINB vs. HNB), we employ BIC and AIC2.
The model with the lower BIC and AIC values is the preferred model. The BIC (AIC) values for
the final specifications of the ZINB and HNB models (in Table 3) clearly indicates that ZINB
model shows superior fit compared to the other models for both cities, suggesting the importance
of exploring the possibility of both over-dispersion and zero-inflation in the observed crash count
event.
Determining the Appropriate Number of Latent Classes
Determining the appropriate number of segments in estimating the latent segmentation based
models is a critical issue with respect to interpretation and inferences. Among different
traditionally used ICs (AIC, BIC, adjusted BIC), BIC imposes substantially higher penalty on
over-fitting and are most commonly used IC for identifying the appropriate number of classes for
latent segmentation based analysis (Nylund et al., 2007). However, in the current study context,
after extensively testing for three segments in latent segmentation based approach for Montreal
we found that the model collapses to the two segment models and for Toronto two segment LS
models provide superior fit based on BIC measures. Thus, we selected two segments as the
appropriate number of segments for all the estimated latent segmentation based models.
Comparing the Unsegmented and Segmented Models
To compare the performance of estimated segmented models with the best fitted unsegmented
model, BIC and AIC measures are used. The computed BIC (AIC) values (as presented in Table
3) for all of the estimated latent models outperform the best fitted unsegmented ZINB model,
suggesting the importance of incorporating population heterogeneity in examining crash count
events. Among the latent models, LNB and LZINB models have the lowest IC values for
Montreal indicating that LNB offers superior fit compared to LZINB model. This comparison
exercise suggests that bicycle crash count events for Montreal are heterogeneous across different
TAZs in the current study context, but not associated with a dual-state count event of crashes
conditional on the latent segments. In case of Toronto, LHNB and LZINB models collapsed to
LHP and LZIP models, respectively. Among the latent segmentation based models, LZIP model
has the lowest IC values for Toronto indicating that LZIP offers superior fit compared to other
latent models. This implies that bicycle crash count events for Toronto are heterogeneous across
different TAZs in the current study context, and are also associated with a dual-state count event
of crashes conditional on the latent segments.
Estimation Results
In explaining the effect of exogenous variable, we will restrict ourselves to the discussion of the
LNB model for Montreal to conserve on space. Table 4 presents the estimation results of the
LNB model. An intuitive discussion of the LNB model is presented followed by the discussion of
2
The BIC for a given empirical model is equal to − 2ln(L) + K ln(Q) and the AIC for a model is equal to
2K−2ln(L)]; where ln(L) is the log-likelihood value at convergence, K is the number of parameters, and Q is the
number of observations.
segmentation component parameters and crash frequency component parameters specific to
segment 1 and 2.
Intuitive Interpretation of LNB Model
To delve into the segmentation characteristics, the model estimates are used to generate
information on: 1) sample share across the two segments, and 2) expected mean of crash count
events within each segment. These estimates are presented in Table 4. From the estimates, it is
evident that the probability of TAZs being assigned to segment 2 is substantially higher than the
probability of being assigned to segment 1. Further, the expected number of bicycle-motor
vehicle crash events conditional on their belonging to a particular segment offer contrasting
results indicating that two segments exhibit distinct crash risk profiles in the current research
context. From Table 4, it is clear that expected mean of crash count events for TAZs assigned to
segment 1 is much higher than the observed sample mean while mean of crash count events for
TAZs assigned to segment 2 is lower than the observed sample mean. Thus, we may label
segment 1 as the “high risk segment” and segment 2 as the “low risk segment”.
Latent Segmentation Component
The latent segmentation component determines the relative prevalence of each class, as well as
the probability of a TAZ being assigned to one of the two latent segments based on different
explanatory variables. In our empirical analysis, the explanatory variables that affect the
allocation of TAZs to segments include lot coverage, number of one-way links, density of STM
bus line, number of restaurants, distance from CBD (central business district) and land use mix.
The results in Table 4 provide the effects of these control variables, using the high risk segment
(segment 1) as the base segment. Thus, a positive (negative) sign for a variable in the
segmentation component indicates that TAZs with the variable characteristic are more (less)
likely to be assigned to the low risk segment relative to the high risk segment, compared to TAZs
that correspond to the characteristic represented by the base category for the variable. The
positive sign on the constant term does not have any substantive interpretation, and simply
reflects the larger size of the low risk segment compared to the high risk segment.
The result associated with lot coverage, a proxy for neighborhood compactness, reflects
that an increase in lot coverage increases the likelihood of assigning TAZs to lower risk segment.
The result perhaps is reflecting lower exposure of bicyclist to motor vehicle as trips in compact
areas are usually characterized by shorter distances. An increase in total number of one-way links
in a TAZ increases the likelihood of assigning the TAZ in higher risk segment. Higher speed of
motor vehicles and closer proximity to bicycle on one-way roads perhaps increase the possibility
of conflicts between these two road user groups. Higher STM bus line density allocates the TAZs
to lower risk segment. This finding could stem from the fact that motorized traffic is less likely to
use curbside lane (mostly used by bikers) in presence of bus route, thus providing greater
separation to bicyclists from the motor traffics.
The result associated with number of restaurants reflects an increased likelihood of TAZ
being assigned to high risk segment with increasing number of restaurants. This might be
explained by the fact that restaurants are one of the major attractors for utilitarian bicycling
(Rybarczyk and Wu, 2010) and therefore are associated with the higher risk segment. Also, the
number of restaurants is likely to be higher around the city center where the number of bicyclists
is likely to be higher thus increasing exposure. The possibility of being allocated to high risk
segment decreases with increasing distance from CBD to the TAZ. TAZs close to downtown are
associated with shorter, more cyclable travel distances which in turn increase the exposure of
bicyclists resulting in more likelihood to be in higher bicycle-motor vehicle crash segment. The
TAZs with higher land use mix are likely to be assigned to the high risk segment, a result also
reported by Cho et al. (2009). Overall, high risk segment is characterized by number of one-way
links, number of restaurants and mixed land uses of TAZs.
Crash Risk Component: High Risk Segment (Segment 1)
The crash risk component within the high risk segment (segment 1) is discussed in this section. In
terms of accessibility measures, bicycle crashes are negatively associated with higher metro
station and AMT station densities in segment one. Our findings differ from several previous
studies (Wier et al., 2009; Ukkusuri et al., 2012), which reported higher crash frequencies at these
locations. However, the finding could be explained by the fact that motorists are alerted of the
location of metro station from far by the large visible metro station sign. Thus they become more
cautious and watchful which enables them to negotiate safely with bicycle traffic. Our analysis
also shows that TAZs with more bus transit destination diversity are likely to be positively
correlated with bicycle crash risk. Generally, more bus destination diversity refers to more bus
transit intensity. Bicycles are usually in blind spots in the vicinity of large vehicles (Pai, 2011)
and the view of bicyclists become more restricted in presence of higher number of buses on the
roadway, which might exacerbate bicycle crash risk.
As found in previous studies (Siddiqui et al., 2012; Quddus, 2008), our study also shows
that more vehicles ownership per household within a TAZ leads to lower probability of bicycle
crashes. Proportion of more males relative to females in TAZ is negatively correlated with
bicycle crash risk, attributable to higher riding experience of male bikers compared to female
bikers (Walker and Jones, 2005). Increased proportion of non-permanent resident and African
population are positively associated with bicycle crashes. This group of population, representing
minority in the community, presumably would have less access to private motor vehicles and
would use bicycle more. Moreover, it has been observed that this group may pose risky traffic
behavior as pedestrians/cyclists (Chen et al., 2012) and in turn might increase bicycle crash risk.
The median travel time to work has a negative coefficient which implies that the
probability of bicycle crash risk decreases as the median time spent travelling on the roads to
commute increases. Increasing median travel times usually reflect increased auto usage and
reduced bicycle usage. Further, living in the proximity of work place would result in lower
median travel time for commuting. While travelling shorter distances, people tend to be less
attentive resulting in more crashes (Huang et al., 2010). Surprisingly, crashes are positively
associated with increased zonal average income in segment one, which is counterintuitive as one
would expect the higher income neighborhood to be car-dominant. This result perhaps is
indicating the fact that bicycle is becoming more popular in high income urban areas (Hatfield et
al., 2012) such as core CBD areas which are usually expensive. However, the result is quite
interesting and the reasons for the effect are not very clear. It is possibly a manifestation of some
unobserved variables that are not considered in our analysis.
With respect to average vehicle age, the model estimate indicates a positive correlation of
bicycle crash risk with higher average age of vehicles. This variable can be considered as proxy
measure for deprivation level of zone and thereby explain higher exposure to potential crash risk.
Proportion of early morning (5.00 a.m. to 6:59 a.m.) commuters is associated with higher bicycle
crash risk. Motorist commuters may not expect bicyclist or perhaps bikes are less conspicuous
due to lower visibility at this hour, which would lead to higher degree of bicycle-motor vehicle
crashes. The results associated with functional class of roadways show that bicycle crash risk is
negatively correlated with highway density and major road density. This may reflect fewer
bicyclists on these types of roads. Consistent with several previous studies (Wei and Lovegrove,
2013), our study results also show that more connection ratio and signalized intersections are
positively associated with more bicycle-motor vehicle crashes.
With respect to the built environment, none of the variables are found to affect bicyclemotor vehicle crash risk in the high risk segment. The significance of dispersion parameter in
segment one confirms that the data are over-dispersed and justifies the use of NB error structure.
Crash Risk Component: Low Risk Segment (Segment 2)
The crash risk component within the low risk segment (segment two) is discussed in this section.
The NB model corresponding to low risk segment provides variable impacts that are significantly
different, in magnitude as well as in sign (for a few variables), from the impacts offered by the
control variables in high risk segment.
In the second segment, the STM bus stop density has positive coefficient estimate. Areas
with greater public transit accessibility are associated with increased activity generation (Kim et
al., 2010). For instance, in Montreal these locations have increased bicycle flows due to the
presence of bicycle facilities such as BIXI (self-serve bicycle) stands; thus it might result in more
bicycle-motor vehicle conflicts. Bus transit destination diversity also is positively correlated with
bicycle crash risk as in segment one. But the effect is less pronounced in segment two. As
expected, higher proportion of driver commuters are associated with lower likelihood of bicycle
crash risk. Average person per households, a proxy for bicycle exposure at zonal level, is found
to be positively correlated with bicycle crashes. From the estimation results, we can also observe
that more kilometers of designated bike lane on road are likely to result in more bicycle-motor
vehicle crashes. In existing bicycle safety literature, there is considerable debate over the merits
and demerits of designated bike lane on road. However, the positive association of designated
bike lane with bicycle crash risk can be attributed to reduced space for bike maneuvers resulting
from parked vehicle on the curb side (right side of bike lane) of the road. Moreover, designated
bike lane on road may discourage proper merging and turning maneuvers for both the bikers and
motorists at intersection resulting in more bicycle-motor vehicle conflicts (see Schimek, 1996 for
more explanation on this).
In terms of socio-demographic characteristics, the estimation results indicate that the TAZ
with higher proportion of American and Asian people are likely to experience lower number of
bicycle crashes while zone with higher proportion of European people are likely to engage in
more bicycle crashes. The influence of median travel time to work has a strikingly different
influence on the bicycle crash risk compared to the effect in high risk segment. In segment two,
the median travel time to work has a positive coefficient which implies that the probability of
bicycle crash risk increases as the median time spent travelling on the roads to commute
increases. This might indicate higher exposure of bicycle on roadways. The result also highlights
how the same variable can have distinct influence on crash risk based on the segment to which
the TAZ is allocated. The LNB approach allows for capturing such complex interactions.
Consistent with several previous studies (Hadayeghi et al., 2003; Siddiqui et al., 2012),
we also observe a positive correlation of bicycle crashes with higher proportion of full-time
workers in TAZs. In segment two, proportion of higher early morning (5.00 a.m. to 6:59 a.m.)
commuters is associated with higher bicycle crash risk as in segment one but the effect of this
variable is more pronounced in second segment. With respect to road networks and traffic
characteristics, the model estimation result indicates an expected negative correlation of major
road density with bicycle-motor vehicle crash risk. For intersection density, the result implies that
more intersections per-capita is associated with higher likelihood of bicycle crash risk. This
finding could be attributed to the well-known fact that vehicular maneuvers at intersections are
complex and increase the potential for bicycle-motor vehicle interactions (and collisions). Results
in Table 4 also indicate that more signalized intersections and cul-de-sec are positively associated
with bicycle-motor vehicle crashes.
With regards to built environment, the result reveals that bicycle crashes are positively
associated with more number of bars in the neighborhood. It is speculated that greater bar
densities would result in more alcohol-involved bicycle crashes (a result also observed by
LaScala et al. (2000) in examining pedestrian-motor vehicle crashes). The significance of
dispersion parameter in segment two confirms that the data are over-dispersed and justifies the
use of NB error structure.
CONCLUSION
In conventional single state (such as Poisson or Negative Binomial) or dual state (such as Zero
inflated or Hurdle) count models, the impact of exogenous factors is restricted to be the same
across the entire region. To accommodate for the potential variation in the impact of exogenous
factors we formulated latent segmentation based count models. The entire set of alternative
modeling approaches considered for this investigation include: Poisson, Negative Binomial (NB),
Hurdle Poisson (HP), Hurdle Negative Binomial (HNB), Zero-inflated Poisson (ZIP), Zeroinflated Negative Binomial (ZINB), Latent Segmentation based Poisson (LP), Latent
Segmentation based Negative Binomial (LNB), Latent Segmentation based Hurdle Poisson
(LHP), Latent Segmentation based Hurdle Negative Binomial (LHNB), Latent Segmentation
based Zero-inflated Poisson (LZIP) and Latent Segmentation based Zero-inflated Negative
Binomial (LZINB) model. For the empirical analysis we selected bicycle-motor vehicle crash
datasets from the Island of Montreal and from the City of Toronto for the years 2006 through
2010. The models were estimated using a comprehensive set of exogenous variables accessibility measures, exposure measures, socio-demographic characteristics, socio-economic
characteristics, road network & traffic characteristics and built environment. The comparison of
the estimated latent segmentation based models, based on information criterion metrics,
highlighted the superiority of the LNB model with two segments for Montreal and LZINB model
with two segments for Toronto in terms of data fit compared to the other estimated models.
Overall, the study results highlight the importance of accommodating population heterogeneity in
the context of bicycle-motor vehicle crash frequency analysis.
Acknowledgement
The first author of the paper would like to gratefully acknowledge the help provided by Adham
Badran, and Michelle Pinto in preparing the dataset.
References
Abdel-Aty, M., Lee, J., Siddiqui, C., & Choi, K. (2013). Geographical unit based analysis in the
context of transportation safety planning. Transportation Research Part A: Policy and
Practice, 49(0), 62-75.
Abdel-Aty, M., Siddiqui, C., Huang, H., & Wang, X. (2011). Integrating trip and roadway
characteristics to manage safety in traffic analysis zones. Transportation Research Record:
Journal of the Transportation Research Board, 2213(1), 20-28.
Agresti, A. (1996). An introduction to categorical data analysis (Vol. 135): Wiley New York.
Aguero-Valverde, J., & Jovanis, P. P. (2006). Spatial analysis of fatal and injury crashes in
Pennsylvania. Accident Analysis & Prevention, 38(3), 618-625.
Aguero-Valverde, J., & Jovanis, P. P. (2008). Analysis of road crash frequency with spatial
models. Transportation Research Record: Journal of the Transportation Research Board,
2061(1), 55-63.
Amoros, E., Martin, J. L., & Laumon, B. (2003). Comparison of road crashes incidence and
severity between some French counties. Accident Analysis & Prevention, 35(4), 537-547.
Bago d'Uva, T. (2006). Latent class models for utilisation of health care. Health economics,
15(4), 329-343.
Brüde, U., & Larsson, J. (1993). Models for predicting accidents at junctions where pedestrians
and cyclists are involved. How well do they fit? Accident Analysis & Prevention, 25(5),
499-509.
Cameron, A. C., & Trivedi, P. K. (1998). Regression analysis of count data: Cambridge
university press, New York.
Carter, D., & Council, F. (2007). Factors contributing to pedestrian and bicycle crashes on rural
highways. Paper presented at the Transportation Research Board 86th Annual Meeting
Paper.
Chen, C., Lin, H., & Loo, B. P. (2012). Exploring the impacts of safety culture on immigrants’
vulnerability in non-motorized crashes: a cross-sectional study. Journal of Urban Health,
89(1), 138-152.
Cheng, L., Geedipally, S. R., & Lord, D. (2012). Examining the Poisson-Weibull generalized
linear model for analyzing crash data. Paper presented at the Transportation Research
Board 91st Annual Meeting.
Chin, H. C., & Quddus, M. A. (2003). Modeling Count Data with Excess Zeroes An Empirical
Application to Traffic Accidents. Sociological methods & research, 32(1), 90-116.
Cho, G., Rodríguez, D. A., & Khattak, A. J. (2009). The role of the built environment in
explaining relationships between perceived and actual pedestrian and bicyclist safety.
Accident Analysis & Prevention, 41(4), 692-702.
Cottrill, C. D., & Thakuriah, P. V. (2010). Evaluating pedestrian crashes in areas with high lowincome or minority populations. Accident Analysis & Prevention, 42(6), 1718-1728.
De Guevara, F. L., Washington, S. P., & Oh, J. (2004) Forecasting crashes at the planning level :
Simultaneous negative binomial crash model applied in Tucson, Arizona. Transportation
Research Record (pp. 191-199).
Eluru, N., Bagheri, M., Miranda-Moreno, L. F., & Fu, L., (2012). A latent class modeling
approach for identifying vehicle driver injury severity factors at highway-railway
crossings. Accident Analysis & Prevention 47, 119-127.
Greene, W. H., & Hensher, D. A. (2003). A latent class model for discrete choice analysis:
contrasts with mixed logit. Transportation Research Part B: Methodological 37 (8), 68198.
Hadayeghi, A., Shalaby, A. S., & Persaud, B. N. (2010b). Development of planning level
transportation safety tools using Geographically Weighted Poisson Regression. Accident
Analysis & Prevention, 42(2), 676-688.
Hadayeghi, A., Shalaby, A., & Persaud, B. (2003). Macrolevel Accident Prediction Models for
Evaluating Safety of Urban Transportation Systems. Transportation Research Record:
Journal of the Transportation Research Board, 1840(-1), 87-95.
Hadayeghi, A., Shalaby, A., & Persaud, B. (2007). Safety Prediction Models: Proactive Tool for
Safety Evaluation in Urban Transportation Planning Applications. Transportation
Research Record: Journal of the Transportation Research Board, 2019(-1), 225-236.
Hadayeghi, A., Shalaby, A., & Persaud, B. (2010a). Development of planning-level
transportation safety models using full Bayesian semiparametric additive techniques.
Journal of Transportation Safety and Security, 2(1), 45-68.
Hatfield, J., Garrard, J., Tørsløv, N., Crist, P., Houdmont, A., Van Damme, O., Gaardbo, A. M.,
Jouannot, T., Ocampo, A., Malasek, J. (2012). Cycling safety: key messages of the
international transport forum of the OECD's working group on cycle safety. Injury
prevention, 18(Supplement 1), A24.
Hauer, E. (2001). Overdispersion in modelling accidents on road sections and in Empirical Bayes
estimation. Accident Analysis & Prevention, 33(6), 799-808.
Hausman, J. A., Hall, B. H., & Griliches, Z. (1984). Econometric models for count data with an
application to the patents-R&D relationship: National Bureau of Economic Research
Cambridge, Mass., USA.
Heilbron, D. C. (1994). Zero‐Altered and other Regression Models for Count Data with Added
Zeros. Biometrical Journal, 36(5), 531-547.
Huang, H., Abdel-Aty, M., & Darwiche, A. (2010). County-Level Crash Risk Analysis in
Florida. Transportation Research Record: Journal of the Transportation Research Board,
2148(-1), 27-37.
Irwin, J. O. (1968). The generalized Waring distribution applied to accident theory. Journal of the
Royal Statistical Society. Series A (General), 205-225.
Jang, T. Y. (2005). Count data models for trip generation. Journal of Transportation Engineering,
131(6), 444-450.
Karlaftis, M. G., & Tarko, A. P. (1998). Heterogeneity considerations in accident modeling.
Accident Analysis & Prevention, 30(4), 425-433.
Kim, K., Pant, P., & Yamashita, E. (2010). Accidents and Accessibility. Transportation Research
Record: Journal of the Transportation Research Board, 2147(-1), 9-17.
LaScala, E. A., Gerber, D., & Gruenewald, P. J. (2000). Demographic and environmental
correlates of pedestrian injury collisions: a spatial analysis. Accident Analysis &
Prevention, 32(5), 651-658.
Lee, J., Abdel-Aty, M., & Jiang, X. (2014). Development of zone system for macro-level traffic
safety analysis. Journal of Transport Geography, 38(0), 13-21.
Levine, N., Kim, K. E., & Nitz, L. H. (1995). Spatial analysis of Honolulu motor vehicle crashes:
II. Zonal generators. Accident Analysis & Prevention, 27(5), 675-685.
Li, Z., Wang, W., Liu, P., Bigham, J. M., & Ragland, D. R. (2013). Using Geographically
Weighted Poisson Regression for county-level crash modeling in California. Safety
Science, 58, 89-97.
Loo, B. P., & Tsui, K. (2010). Bicycle crash casualties in a highly motorized city. Accident
Analysis & Prevention, 42(6), 1902-1907.
Lord, D., & Mannering, F. (2010). The statistical analysis of crash-frequency data: a review and
assessment of methodological alternatives. Transportation Research Part A: Policy and
Practice, 44(5), 291-305.
Lord, D., & Miranda-Moreno, L. F. (2008). Effects of low sample mean values and small sample
size on the estimation of the fixed dispersion parameter of Poisson-gamma models for
modeling motor vehicle crashes: A Bayesian perspective. Safety Science, 46(5), 751-770.
MacNab, Y. C. (2004). Bayesian spatial and ecological models for small-area accident and injury
analysis. Accident Analysis & Prevention, 36(6), 1019-1028.
Maher, M., & Mountain, L. (2009). The sensitivity of estimates of regression to the mean.
Accident Analysis & Prevention, 41(4), 861-868.
McCullagh, P., & Nelder, J. A. (1989). Generalized linear models. London: Chapman and Hall.
Miaou, S.-P. (1994). The relationship between truck accidents and geometric design of road
sections: Poisson versus negative binomial regressions. Accident Analysis & Prevention,
26(4), 471-482.
Miaou, S.-P., Song, J. J., & Mallick, B. K. (2003). Roadway traffic crash mapping: a space-time
modeling approach. Journal of Transportation and Statistics, 6, 33-58.
Min, Y., & Agresti, A. (2005). Random effect models for repeated measures of zero-inflated
count data. Statistical Modelling, 5(1), 1-19.
Moeinaddini, M., Asadi-Shekari, Z., & Zaly Shah, M. (2014). The relationship between urban
street networks and the number of transport fatalities at the city level. Safety Science, 62,
114-120.
Naderan, A., & Shahi, J. (2010). Aggregate crash prediction models: Introducing crash
generation concept. Accident Analysis & Prevention, 42(1), 339-346.
Ng, K.-s., Hung, W.-t., & Wong, W.-g. (2002). An algorithm for assessing the risk of traffic
accident. Journal of safety research, 33(3), 387-410.
Noland, R. B. (2003). Traffic fatalities and injuries: the effect of changes in infrastructure and
other trends. Accident Analysis & Prevention, 35(4), 599-611.
Noland, R. B., & Oh, L. (2004). The effect of infrastructure and demographic change on trafficrelated fatalities and crashes: a case study of Illinois county-level data. Accident Analysis
& Prevention, 36(4), 525-532.
Noland, R. B., & Quddus, M. A. (2004a) Analysis of pedestrian and bicycle casualties with
regional panel data. Transportation Research Record (pp. 28-33).
Noland, R. B., & Quddus, M. A. (2004b). A spatially disaggregate analysis of road casualties in
England. Accident Analysis & Prevention, 36(6), 973-984.
Nylund, K. L., Asparouhov, T., & Muthén, B. O. (2007). Deciding on the number of classes in
latent class analysis and growth mixture modeling: a Monte Carlo simulation study.
Structural Equation Modeling 14 (4), 535-569.
Pai, C.-W. (2011). Overtaking, rear-end, and door crashes involving bicycles: An empirical
investigation. Accident Analysis & Prevention, 43(3), 1228-1235.
Park, B.-J., & Lord, D. (2009). Application of finite mixture models for vehicle crash data
analysis. Accident Analysis & Prevention, 41(4), 683-691.
Park, B.-J., Lord, D., & Hart, J. D. (2010). Bias properties of Bayesian statistics in finite mixture
of negative binomial regression models in crash data analysis. Accident Analysis &
Prevention, 42(2), 741-749.
Peng, Y., Lord, D., & Zou, Y. (2014). Applying the Generalized Waring model for investigating
sources of variance in motor vehicle crash analysis. Transportation Research Board
Annual Meeting 2014 Paper #14-1352.
Pucher, J., & Buehler, R. (2006). Why Canadians cycle more than Americans: A comparative
analysis of bicycling trends and policies. Transport Policy, 13(3), 265-279.
Pucher, J., Buehler, R., & Seinen, M. (2011). Bicycling renaissance in North America? An
update and re-appraisal of cycling trends and policies. Transportation Research Part A:
Policy and Practice, 45(6), 451-475.
Quddus, M. A. (2008). Modelling area-wide count outcomes with spatial correlation and
heterogeneity: an analysis of London crash data. Accident Analysis & Prevention, 40(4),
1486-1497.
Reynolds, C., Harris, M. A., Teschke, K., Cripton, P. A., & Winters, M. (2009). The impact of
transportation infrastructure on bicycling injuries and crashes: a review of the literature.
Environmental Health, 8(1), 47.
Rybarczyk, G., & Wu, C. (2010). Bicycle facility planning using GIS and multi-criteria decision
analysis. Applied Geography, 30(2), 282-293.
Schimek, P. (1996). The dilemmas of bicycle planning. Paper presented at the Joint International
Congress of the Association of Collegiate Schools of Planning (ACSP) and the
Association of European Schools of Planning (AESOP), July.
Shankar, V., Milton, J., & Mannering, F. (1997). Modeling accident frequencies as zero-altered
probability processes: an empirical inquiry. Accident Analysis & Prevention, 29(6), 829837.
Siddiqui, C., Abdel-Aty, M., & Choi, K. (2012). Macroscopic spatial analysis of pedestrian and
bicycle crashes. Accident Analysis & Prevention, 45, 382-391.
Sobhani, A., Eluru, N., & Faghih-Imani, A. (2013). A latent segmentation based multiple discrete
continuous extreme value model. Transportation Research Part B: Methodological, 58,
154-169.
Stamatiadis, N., & Puccini, G. (2000). Socioeconomic descriptors of fatal crash rates in the
Southeast USA. Injury Control and Safety Promotion, 7(3), 165-173.
Turner, S., Francis, T., Roozenburg, A., & Land Transport, N. (2006). Predicting accident rates
for cyclists and pedestrians: Land Transport New Zealand.
Ukkusuri, S., Hasan, S., & Aziz, H. (2011). Random Parameter Model Used to Explain Effects of
Built-Environment Characteristics on Pedestrian Crash Frequency. Transportation
Research Record: Journal of the Transportation Research Board, 2237(-1), 98-106.
Ukkusuri, S., Miranda-Moreno, L. F., Ramadurai, G., & Isa-Tavarez, J. (2012). The role of built
environment on pedestrian crash frequency. Safety Science, 50(4), 1141-1151.
Walker, I., & Jones, C. (2005). The Oxford and Cambridge Cycling Survey. Oxfordshire County
Council.
Wedel, M., DeSarbo, W. S., Bult, J. R., & Ramaswamy, V. (1993). A latent class Poisson
regression model for heterogeneous count data. Journal of Applied Econometrics, 8(4),
397-411.
Wei, F., & Lovegrove, G. (2013). An empirical tool to evaluate the safety of cyclists: Community
based, macro-level collision prediction models using negative binomial regression.
Accident Analysis & Prevention, 61, 129-137.
Welsh, A. H., Cunningham, R. B., Donnelly, C., & Lindenmayer, D. B. (1996). Modelling the
abundance of rare species: statistical models for counts with extra zeros. Ecological
Modelling, 88(1), 297-308.
Wier, M., Weintraub, J., Humphreys, E. H., Seto, E., & Bhatia, R. (2009). An area-level model of
vehicle-pedestrian injury collisions with implications for land use and transportation
planning. Accident Analysis and Prevention, 41(1), 137-145.
Yasmin, S., Eluru, N., Bhat, C. R., & Tay, R. (2014). A latent segmentation based generalized
ordered logit model to examine factors influencing driver injury severity. Analytic
Methods in Accident Research, 1, 23-38.
Zou, Y., Zhang, Y., & Lord, D. (2014). Analyzing different functional forms of the varying
weight parameter for finite mixture of negative binomial regression models. Analytic
Methods in Accident Research, 1(0), 39-52.
TABLE 1 Summary of Existing Macro-Level Crash Frequency Studies
Noland and
Quddus
(2004a)
Wier et al.
(2009)
Wei et al.
(2013)
Siddiqui et al.
(2012)
Cho et al.
(2009)
MacNab
(2004)
Noland (2003)
Built
environment
Negative binomial
Pedestrian
Bicyclist
Standard
statistical
regions
Fatal/Serious
injury crashes,
Slight injury
crashes
---
Yes
Yes
Yes
Yes
---
Ordinary least square
regression
Pedestrian crash
Census tract
Injury crashes
---
Yes
Yes
Yes
Yes
Yes
Negative binomial
Bicycle crash
Traffic
analysis zone
Total crashes
Yes
Yes
Yes
Yes
Yes
Yes
Negative binomial,
Bayesian log-normal
model
Pedestrian and
bicycle crash
Traffic
analysis zone
Total pedestrian
crashes, Total
bicycle crashes
---
Yes
Yes
Yes
Yes
Yes
Pedestrian and
bicycle crash
Young (Age 0-25)
pedestrian and
Bicyclist crash
Community
analysis zones
Total crashes
Yes
Yes
---
---
Yes
Yes
Local health
areas
Injury crashes
---
Yes
---
Yes
Yes
Yes
Crash
State
Fatal crashes,
Injury crashes
---
Yes
Yes
Yes
Yes
---
County
Total crashes,
Fatal crashes
---
Yes
Yes
Yes
Yes
---
---
Yes
---
Yes
Yes
---
---
Yes
Yes
Yes
Yes
---
---
---
---
---
Yes
---
Yes
Yes
Yes
---
Yes
Yes
Path analysis
Bayesian spatial and
ecological regression
model
Random Effect
negative binomial
Noland and
Quddus
(2004a)
Negative binomial
Crash
Karlaftis et al.
(1998)
Cluster analysis,
Negative binomial
Aged driver crash
Bayesian spatial model
Crash
County
Negative binomial
Crash
County
Random parameter
negative binomial
Pedestrian crash
Census tract
Huang et al.
(2010)
Amoros et al.
(2003)
Ukkusuri et
al. (2011)
Crash level
Road network
and Traffic
characteristics
Spatial Unit
Socio-economic
characteristics
Unit of Analysis
SocioDemographic
characteristics
Methodological
Approach
Exposure
Measures
Paper
Accessibility
measures
Independent Variable Considered
Total crashes,
Urban crashes,
Rural crashes
Total crashes,
Severe crashes
Total crashes,
Fatal crashes
Total crashes
Cottrill et al.
(2010)
Quddus
(2008)
Poisson Regression
with heterogeneity
Pedestrian crash
Negative Binomial,
Spatial autoregressive
model, Bayesian
hierarchical model
Census Tract
Total crashes
Yes
Yes
Yes
Yes
Yes
Yes
Motorized vehicle
crash, Nonmotorized vehicle
crash, Pedestrian
crash
Census ward
Fatal crashes,
Serious injury
crashes, Slight
Injury crashes
---
Yes
Yes
---
Yes
---
---
Yes
---
---
---
---
---
---
---
---
---
Yes
---
Yes
---
---
Yes
---
Total crashes,
PDO crashes,
Injury crashes,
Fatal crashes
Total crashes,
Fatal crashes,
Pedestrian crashes
Total crashes,
Severe crashes,
Peak hour crashes,
Pedestrian and
Bicycle crashes
Naderan et al.
(2010)
Negative binomial
Crash
Traffic
analysis zone
Ng et al.
(2002)
Cluster analysis,
Negative Binomial
Crash
Traffic
analysis zone
Abdel-Aty et
al. (2011)
Negative binomial
Crash
Traffic
analysis zone
Full Bayes
hierarchical model
Crash
County
Fatal crashes,
Injury crashes
---
Yes
Yes
Yes
---
---
Quasi induced
exposure method
Single vehicle crash
Multi vehicle Crash
State
Fatal crashes
---
Yes
Yes
Yes
---
---
Noland (2003)
Negative binomial
Crash
State
Fatal injury
crashes
---
Yes
Yes
Yes
Yes
---
LaScala et al.
(2000)
Spatial autocorrelation
corrected regression
Pedestrian crash
Census tract
Injury crashes
---
---
Yes
Yes
Yes
Yes
Total crashes,
Severe crashes,
Pedestrian crashes
---
Yes
Yes
Yes
Yes
---
Yes
Yes
---
Yes
---
AgueroValverde and
Jovanis
(2006)
Stamatiadis
and Puccini
2000
Abdel-Aty et
al. (2013)
Poisson-lognormal
Crash
Pedestrian crash
Traffic
analysis zones
Block groups
Census
tracts
Levine et al.
(1995)
Spatial lag
Crash
Census block
Total crashes
---
Yes
---
Bayesian Poisson
Lognormal
Crash
Traffic
analysis zone
Traffic safety
Total crashes,
Severe crashes
---
Yes
Yes
Lee et al.
(2014)
analysis zone
Li et al.
(2013)
Noland and
Quddus
(2004b)
Geographically
Weighted Poisson
Regression
Crash
County
Fatal crashes
Fatal crashes,
Serious injury
crashes, Slight
injury crashes
Fatal crashes,
Injury crashes,
PDO crashes
---
Yes
Yes
Yes
Yes
---
---
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
---
Yes
Yes
Yes
Yes
---
---
---
---
---
Yes
---
Yes
Yes
Yes
Yes
Yes
Yes
Negative binomial
Crash
Census wards
De Guevara et
al. (2004)
Simultaneous negative
binomial
Crash
Traffic
analysis zone
Hadayeghi et
al. (2003)
Negative binomial
Geographically
weighted regression
Crash
Traffic zone
Negative binomial
Crash
City
Negative binomial
Crash
Traffic
analysis zone
Fatalities per
million inhabitants
Total crashes,
Severe crashes
Crash
Traffic
analysis zone
Total crashes,
Severe crashes
Yes
Yes
Yes
Yes
Yes
Yes
Crash
Traffic
analysis zone
Total crashes,
Severe crashes
Yes
Yes
Yes
Yes
Yes
Yes
Moeinaddini
et al. (2014)
Hadayeghi et
al. (2007)
Hadayeghi et
al. (2010a)
Hadayeghi et
al. (2010b)
Geographically
Weighted Poisson
Regression, Full
Bayesian
Semiparametric
Additive
Geographically
Weighted Poisson
Regression
Total crashes,
Severe crashes
TABLE 2 Sample Statistics for Montreal
Variables
Definition
Minimum
Zonal
Maximum
Average
.000
6.419
.804
.000
37.150
.478
.000
19.230
.074
.000
138.194
26.231
.000
25.000
4.816
.000
.869
.477
.000
.000
.000
3.840
2.377
2.784
1.835
.828
.077
.000
2.191
.957
.000
.321
.033
.000
.560
.103
.000
.923
.174
.000
.565
.116
.000
.640
.158
Accessibility measures
Density of STM bus line
Metro station density
AMT station density
STM bus stop density
Total Société de transport de Montréal (STM) bus line in
meter/Total length of road network in meter
Total metro station/Total TAZ area in kilometer square
Total Agence métropolitaine de transport (AMT)
station/Total TAZ area in kilometer square
Total Société de transport de Montréal (STM) bus
stops/Total TAZ area in kilometer square
Bus transit destination
Total number of different bus routes in TAZ
diversity
Exposure measures
Proportion of driver
Total deriver commuters/Total commuters in TAZ
commuters
Average person per HH
Average person per household in TAZ
Average car per HH
Average car per household in TAZ
Designated bike lane on road
Total length of designated bike lane on road in kilometer
Socio-demographic characteristics
Total male population in TAZ/Total female population in
Proportion of male to female
TAZ
Proportion of non-permanent
Total non-permanent resident in TAZ/Total population of
resident
TAZ
Proportion of African
Total African resident in TAZ/Total population of TAZ
population
Proportion of Asian population Total Asian resident in TAZ/Total population of TAZ
Proportion of American
Total American resident in TAZ/Total population of TAZ
population
Proportion of European
Total European resident in TAZ/Total population of TAZ
population
Socio-economic characteristics
Median commute duration
Median commuting duration of TAZ in minutes
Average TAZ income
Natural log of average TAZ income
Proportion of full-time workers Total full-time workers in TAZ/Total workers of TAZ
Average vehicle age
Average age of all private vehicles in TAZ
Proportion of commuters
Total commuters commuting between 5 a.m. and 6:59 a.m.
commuting between 5 a.m. and
in TAZ/Total commuters of TAZ
6:59 a.m
Road network & traffic characteristics
Total length of highways in TAZ/Total length of road
Proportion of highway
network of TAZ
Total length of major roads in TAZ/Total length of road
Proportion of major road
network of TAZ
Total number of intersections in TAZ/(Total number of
Connection ratio
intersections+Total number of cul-de-dec) of TAZ
Density of signalized
Total signalized intersections in TAZ/Total number of
intersection
intersections of TAZ
Total number of intersections in TAZ/Total length of road
Density of intersections
network in kilometer of TAZ
Number of Cul-de-sec
Total number of Cul-de-sec in TAZ
Number of one-way links
Natural log of total number of one-way links
Built environment
Number of bars
Total number of bars in TAZ
Lot Coverage
Building foot print area of TAZ/Total area of TAZ
Number of restaurants
Z-score of number of restaurants
Distance from CBD
Natural log of distance from CBD to the TAZ in kilometer
.000
.000
.000
7.038
35.100
12.188
.774
11.211
25.178
10.149
.552
8.982
.000
.401
.176
.000
1.000
.104
.000
.985
.177
.000
.947
.053
.000
2.000
.143
.000
15.886
4.498
.000
.000
29.000
4.920
1.987
2.810
.000
.000
-.704
.125
21.000
.583
10.027
33.661
.576
.170
.001
8.971
.000
.999
.495
− ∑ (𝑝 (𝑙𝑛𝑝 ))
Land use mix
π‘˜
π‘˜
Land use mix = [ π‘˜ 𝑙𝑛𝑁
], where π‘˜ is the category of
land-use, 𝑝 is the proportion of the developed land area
devoted to a specific land-use, 𝑁 is the number of land-use
categories in a TAZ
TABLE 3 Measures of Fit in Estimation Sample
Models
Poisson
MONTREAL
Log-likelihood at
Number of parameters
Convergence
30
-5130.01
BIC
AIC
10506.87
10320.03
NB
26
-4670.20
9554.33
9392.40
HP
45
-4492.99
9356.25
9075.98
HNB
45
-4360.08
9090.43
8810.16
ZIP
37
-4511.56
9327.57
9097.13
ZINB
39
-4356.33
9033.57
8790.67
LP
43
-4240.31
8834.44
8566.63
LNB
39
-4150.66
8622.22
8379.32
LHP
47
-4228.44
8843.61
8550.88
LHNB
42
-4230.88
8807.34
8545.75
LZIP
48
-4175.23
8745.40
8446.45
LZINB
47
-4147.10
8680.93
8388.20
BIC
AIC
10524.58
10400
Models
Poisson
TORONTO
Log-likelihood at
Number of parameters
Convergence
20
-5180.01
NB
21
-5105.43
10383.66
10252.9
HP
31
-5113.50
10482.07
10289
HNB
32
-5102.71
10468.72
10269.4
ZIP
28
-5084.77
10399.92
10225.5
ZINB
27
-5064.24
10350.64
10182.5
LP
30
-4998.03
10242.91
10056.1
LNB
28
-4919.68
10069.75
9895.36
LHP
34
-4953.98
10187.73
9975.97
LZIP
30
-4907.88
10062.61
9875.77
NB = Negative Binomial, HP = Hurdle Poisson, HNB = Hurdle Negative Binomial, ZIP = Zero-inflated Poisson,
ZINB = Zero-inflated Negative Binomial, LP = Latent Segmentation based Poisson, LNB = Latent Segmentation
based Negative Binomial, LHP = Latent Segmentation based Hurdle Poisson, LHNB = Latent Segmentation based
Hurdle Negative Binomial, LZIP = Latent Segmentation based Zero-inflated Poisson and LZINB = Latent
Segmentation based Zero-inflated Negative Binomial model, BIC = Bayesian Information Criterion; AIC = Akaike
Information Criterion.
TABLE 4 Latent Segmentation based Negative Binomial Model with Two Segments (LNB)
Estimates for Montreal
Segments
Sample shares
Observed mean of crash events
Expected mean of crash events
Variables
Constant
Lot Coverage
Number of one-way links
Density of STM bus line
Number of restaurants
Distance from CBD
Land use mix
Segment 1
0.25
0.82
2.74
SEGMENT COMPONENTS
Segment 1
Estimate
t-stat
----------------------------CRASH COUNT COMPONENT
-1.411
-1.407
Constants
Accessibility measures
Metro station density
AMT station density
STM bus stop density
Bus transit destination diversity
Exposure measures
Proportion of driver commuters
Average person per HH
Average car per HH
Designated bike lane on road
Socio-demographic characteristics
Proportion of male to female
Proportion of non-permanent resident
Proportion of African population
Proportion of Asian population
Proportion of American population
Proportion of European population
Socio-economic characteristics
Median commute duration
Average TAZ income
Segment 2
0.75
0.44
Segment 2
Estimate
t-stat
1.471
1.855
2.646
2.045
-0.912
-5.767
1.773
6.558
-0.343
-2.838
0.920
4.660
-1.725
-3.408
-4.271
-8.506
-0.374
-0.741
--0.207
-2.280
-1.665
--7.154
----0.005
0.075
----1.652
4.442
-----1.864
---
-----5.292
---
-3.941
0.303
--0.678
-7.341
3.996
--5.118
-1.144
2.671
2.463
-------
-4.309
1.718
3.007
-------
-------0.653
-1.127
1.143
-------2.101
-2.002
2.147
-0.066
0.285
-4.346
4.552
0.049
---
4.434
---
Proportion of full-time workers
Average vehicle age
Proportion of commuters commuting
between 5 a.m. and 6:59 a.m.
Road network & traffic characteristics
Proportion of highway
Proportion of major road
Connection ratio
Density of signalized intersection
Density of intersections
Number of Cul-de-sec
Built environment
Number of bars
Dispersion parameter
--1.958
--5.736
2.066
---
3.228
---
1.520
1.815
4.547
4.988
-2.376
-1.263
1.804
1.146
-----
-5.392
-2.626
1.871
1.912
-----
---0.544
--0.763
0.142
0.051
---1.661
--3.038
3.686
4.557
--0.635
--7.351
0.047
0.587
1.671
7.710
Download