bicycle crash analysis

advertisement
The Role of Urban Scale, Transportation, and Demographics in Cycling Risk
Greg Rybarczyk
Department of Geography
Urban Planning 771
Spring 2006
1. Introduction
In the United States, the ever present automobile driven culture has successfully designed
roadways to move gasoline powered vehicles from point A to point B as efficiently as possible.
As a result, the ominous auto-dependent transportation network has contributed to environmental
degradation, fragmented neighborhoods, and overall, contributes to the deterrence of a bicycle
mode of transportation (Garder, 1994). Moreover, safety concerns, undesirable bicycle routes,
and un-connected routes have contributed to reduced bicycle transportation. The rapid increase
in the use of bicycles in the past decade has been well documented. As one would expect, this
increased usage has led to a parallel rise in the number of bicycle accidents. Furthermore, a
common failing of most city planning is that bicycle route planning is ad-hoc and fails to allocate
routes that are not in conflict with cars (Hanson, 1995).
There are approximately 38 bicycle related fatalities per billion passenger-kilometers. In
1994 alone 722 bicyclists were killed in the United States (Black, 2003 and Wachtel et. al.,
1994). In spite of federal legislation that has been centered on increasing funding for alternative
modes of transportation, approximately 650,000 people nationwide are treated for bicycle related
injuries each year. While federal acts such as the Intermodal Surface Transportation Efficiency
Act (ISTEA) and the Transportation Equity Act (TEA-21) has done much in the way in
reforming U.S. policy and implement bicycle infrastructures, un-safe bicycling environments
continue to rise. Unfortunately, most other federal policies still cater to the automobile and do
little to improve the safety of bicycling. At the same time, bicycling is becoming ever more
popular among commuters and recreational riders (Rodgers, 1997). The question becomes: how
can policy makers satisfy federal mandates to increase bicycling while minimizing bicycling
risk. Therefore, a sizable amount of literature has been published that has looked a bicycle
accident patterns, but few have investigated what factors contribute to bicycle accident rates in
zones while incorporating only neighborhood level data.
The structure of the paper is as follows. Section 2 consists of a contextual description of
the objectives of the study. In addition, the feasibility of this research is also highlighted in
respect to past attempts in this area of research. Section 3 describes the study area and data
preprocessing. Section 4 describes the analytical approach used to investigate bicycle crashes in
the study area and include: exploratory crash data analysis, cluster analysis, and spatial
autoregressive regression analysis. Section 5 consists of the comparisons and results of the
modeling procedure. Finally we conclude in section 6.
2. Research Objective:
In order to produce an environment that will encourage the utilization of bicycling as a
viable transportation option, it must be perceived to be safe. There have been many studies that
have utilized bicycle traffic accidents to model roadways for planning and safety (Pawlovich et.
al. 1998 and Garder, 1994). One way to assess bicycle travel safety is to understand what factors
relate to incidences of bicycle crashes. While there have been many studies that relate bicycle
crashes to roadway features, few have assessed them within and among zones. Moreover,
bicycle accident data analysis at the neighborhood level is a logical choice to include for future
policy action, identify problems, measure progress, and develop potential countermeasures.
The trip distribution step of the well known Urban Transportation Planning System
(UTPS) is to predict traffic based on zonal attractions and generators and to delineate traffic
between zones and then allocated to travel modes (Levine et. al., 1995). Therefore, this same
approach can be utilized to predict crashes among zones based on neighborhood traffic, land-use,
and demographic variables. If the prediction is accurate, these findings can then be interpolated
to roadway features. This research will attempt to quantify variables that are integral to the
increase in bicycling safety by controlling for neighborhood area and autocorrelation effects.
The neighborhood unit of assessment was used because accidents are related to local
characteristics of which they occur and are not a road specific phenomenon. Therefore, the
neighborhood composition is then the optimum way to explain higher levels of intra and inter
modal traffic, land-use, and demographics.
The combination of neighborhood physical road attributes, social-economic quality, and
demographic data may uncover causal factors associated with bicycle crash rates. By
incorporating non-roadway attributes, more informed decisions can then be made by the
increasing neighborhood level planning initiatives. In addition, few attempts have been made
that takes into account roadway and demographic variables in bicycle safety analysis
(Pawlovich, 1998). Therefore, the goals of this study were to establish that bicycle crashes were
not randomly dispersed and then develop a robust bicycle accident predictive model that
minimizes adjacent neighborhood influences. In other words, this study accounts for the spatial
variation between neighborhood land-use, socio-economic quality, and roadway attributes while
minimizing autocorrelation among units. The objectives of this paper is then: to explore the
problem of bicycle accidents in the City of Milwaukee and develop a model that identifies
bicycle accident clusters and minimizes autocorrelation to reveal spatial dependence of bicycle
accident rates related to land-use, density, and roadway infrastructure. Therefore, the hypothesis
was that elevated bicycle accident rates are not random but related to an underlying
comprehensive phenomenon that is particular to each neighborhood. In effect, the expected
outcome is that the pattern of aggregated accident rates is not completely random, and that causal
factors related to neighborhood demographics, land-use, and road characteristics can be used for
crash rate prediction.
3. Data and Study area:
The study area consisted of the City of Milwaukee located in Milwaukee County,
Wisconsin (figure 1). The neighborhood boundaries were obtained and derived from the City of
Milwaukee and includes 190 distinct polygons. The primary data source includes bicycle
accident counts obtained from the Wisconsin Department of Transportation during the years
1999 through 2003. The City of Milwaukee traffic control data was also geocoded and included
controlled and uncontrolled intersection data. Bus route data was obtained from the Milwaukee
County Transit Service and include total length in miles of each route. The bicycle accident
database contains all type of accidents including: location by address, manner of collision,
number injured, time/day/month/year of accident, light conditions, roadway surface conditions.
A total of 979 crashes occurred within the City of Milwaukee in from the years 1999 thru 2003.
Population data was obtained from the U.S. Census Bureau SF-1 database. Land-use and urban
scale data was obtained from the City of Milwaukee’s Milwaukee Property file (MPROP).
Planimetric roadway data (WISLER and Fire DIME) was obtained from the Wisconsin
Department of Transportation (WIDOT) and City of Milwaukee.
§
¨¦
I-43
§
¨¦
I-94
§
¨¦
I-794
§
¨¦
I- 43
§
¨¦
I- 894
¸
Figure 1, Milwaukee and its neighborhoods
3.1 Data Preparation
The heterogeneous composition of the data used in this study required a rigorous data
geoprocessing component. Through the use of ArcGIS 9.1, U.S. SF1 census block data was first
extrapolated to the neighborhoods of Milwaukee and then aggregated. SF1 demographic data
that was retained for further analysis included demographic and income attributes. Land-use,
scale, and urban density attributes used in this study was derived from the MPROP database.
The database contained information for each parcel in the study area. Vehicle miles traveled
(VMT) was computed by this equation: total AADT * 365 days/year * roadway length (ft) / 5280
ft (per mile). In addition, a mean bicycle level of service (BLOS) was also computed for each
neighborhood (appendix 1). The BLOS is an index developed by Landis (1996b) that attempts to
quantify the suitability of a roadway segment for bicycling based on real-time human perceptions
of safety. A bicycle crash rate (dependent variable) was established by dividing the number of
crashes per neighborhood by the area (sq miles) and became the dependent variable in this
research. This normalization was utilized to remove the effect of differing spatial size of the
neighborhood. Further descriptions of neighborhood level data aggregation are displayed in
table 1.
Demographic
Transportation
# Gas/Stores
# Schools
% Proportion White
% Proportion Male
% Proportion Female
% Pop under 5
% Pop 5-17
% Pop 17-65
Ave Household Size
Total # Land-use
Sum Bus Routes (miles)
Sum Vehicle Miles Traveled
Ave Bicycle Level of Service
Ave Pavement Condition
Sum Controlled Intersections
Sum Uncontrolled
Intersections
% Heavy Truck Traffic
Sum Interstate Miles
Ave # Roadway Lanes
Table 1, Neighborhood Data
Economic
Median Household Income
Ave Building Stories (height)
Ave Building Area
Ave Lot Area
% Renter Occupied
% Owner Occupied
4.0 Methods
Many studies have incorporated either infrastructure or environmental conditions to
model bicycle accident relationships. The main goal of this study is to understand, through a
classical linear regression moving forward to a spatial regression model, the spatial cooccurrence of bicycle accident concentrations per neighborhood. In other words, why do
concentrations of accidents occur where they do? Figure 2 depicts the flow chart that utilized to
derive a statistically significant model. A local cluster analysis was first utilized in order to
verify deviation from complete spatial randomness of the accident points. Moran’s I was used to
determine the degree that the accident points were autocorrelated and then a cluster analysis was
conducted and included: Hierarchical Nearest Neighbor Analysis, Standard Deviational Eclipse,
and Moran’s I cluster diagnostic test was utilized to present the accident black spots. Once
clustering was observed, a model was developed to attribute possible variables to the clustering
phenomenon. The neighborhood unit was chosen as the spatial unit of study because it
eliminates the randomness that may appear in a study of random events and accounts for traffic
spillover effects that would not be accounted for in the typical trip-distribution model (Flahaut,
2004 and Levine et.al. , 1997). The flow chart that guided this study is displayed in figure 2 and
was adapted from Anselin (2005).
Exploratory Data Analysis
Cluster Analysis
Calculate Local Moran’s I
Run Spatial
Error Model
Run OLS Regression
LM -Error and LM-Lag
Either LM-Error or LM-Lag
Significant?
Both LM -Error and LM-Lag
LM-Error
One Significant
LM-Lag
Stop: Keep OLS
Results
Robust LM-Error and LM-Lag
Results
Robust LM-Error
Significant?
Run Spatial
Error Model
Run Spatial
Lag Model
Robust LM-Lag Results
Run Spatial Lag
Model
Figure 2, methodology flowchart
4.1 Exploratory Data Analysis
In conformance with the numerous past studies of bicycle crash analysis, this study has
included various exploratory statistics into space-time and physical patterns of bicycle crashes.
Given the complexity of the factors involved with bicycle accidents, a brief statistical picture is
needed as a baseline to determine frequencies. Furthermore, the statistical figures associated
with the accidents cannot allude to the actual bicycling rate to determine exposure because that
data is not provided. As a result, a typology of crash occurrences in relation to time, day, and
type of crash was developed.
4.2 Cluster Analysis
Prior to spatial modeling, a cluster analysis data mining technique was conducted in order
to verify any, if at all, clustering of like accident values among the accident points and crash
rates per neighborhood. The 2 dimensional clustering techniques used in this study will allow
for insight into potential underlying processes associated with environmental parameters,
identify bicycle accident zones, and also guide model development. A first order point pattern
analysis using the nearest neighbor function in Esri’s ArcGIS 9.1 and CrimeStat software was
utilized also as exploratory tool. Nearest Neighbor analysis assesses all points and calculates the
mean distance to the nearest neighbor, if this distance is less than expected, then clustering is
evident (Carpenter, 2001). More specifically, a nearest neighbor hierarchical (NNH) analysis
using CrimeStat was also conducted to determine first order clusters based on 10 points are
evident. In NNH, a bicycle accident based on a hierarchical order whereas first order clusters are
grouped first then fused to lower order groups until all points are clustered. Moreover, the
procedure identifies first-order clusters, representing groups of points that are closer together
than the threshold distance and in which there is at least the minimum number of points specified
by the user until each cluster is filled. In addition, a standard deviation eclipse method was also
utilized to measure the direction and concentration of bicycle crash points. The standard
deviational ellipse defines both the orientation of the dispersion of point based clustering.
Lastly, a Local Moran’s I was conducted in ArcGIS 9.1 and Geoda with neighborhood polygons
to determine second order relationships. Moran’s I index has historically been used in polygon
based autocorrelation measures due to its common application to aerial units, ease of use, and
stable results (Steenberghen and Dufays, 2004, O’Sullivan and Unwin, 2004). A spatial weights
file was created in Geoda and utilized to derive a polygon based Moran’s I. Because we are
concerned with all forms of contiguity between Neighborhood units, a rook contiguity, 1st order
weight matrix was created. A rook contiguity index was chosen because the goal was to produce
and account for all neighbors.
4.3 Regression Analysis
An empirical analysis of bicycle accidents would be incomplete without a correlation and
causal determination. A correlation analysis was conducted in SPSS to test the independent
variables for significance and colinearity (table 4). The Pearson’s R for each proposed variable
is depicted in table 4. As a result, only 19 explanatory variables were utilized in the SAR model
(table 5). After it was determined that clustering was significant in the crash points and the crash
rates per neighborhood, a model was developed to determine contributing explanatory factors.
Therefore, subsequent to cluster analysis, an identification of empirical relationships was
uncovered via a classic and spatial regression hierarchy.
Zonal based data may exhibit a spatial relationship with other zones. In other words,
zonal data may have an inherent relationship with its neighbor. In order to account for this
spatial dependency of neighborhood zones, their covariance’s matrix should be included in the
prediction model. While a classical linear regression model may be able to determine factors
that are linearly related to its variables, there is no accountability for spatial dependence.
Therefore a more appropriate model accounts for autocorrelation. Unfortunately, if we introduce
spatial error into an OLS model we derive inefficient coefficients. When a spatial lag term is
introduced to OLS regression, independence and multicollinerity are violated.
There are two types of spatial auto regressive approaches (SAR): spatial error and spatial
lag. Spatial error consists of correlations across space in the error term and spatial lag is where
the dependent variable is affected by the location of independent variable in space “i” and
independent variable in space “j” (Anselin and Bera, 1998). Autocorrelation can be modeled by
considering the correlation among error terms (spatial error) or by modeling the spatial trend in
the error terms (spatial lag), (Fotheringham, et. al. 2004). Furthermore, by utilizing a SAR
model in this study, the mean of the neighboring crash densities is used as another explanatory
variable. In this study, Geoda was utilized to proceed with a spatially lagged or error regression
model.
Pearson’s Correlation
Av HH Size
0.291
Uncontrolled Intersection
-0.003
Controlled Intersection
0.310
Stop Signs
Age 17-64
0.481
0.474
Age 5-17
0.498
Total Bus Miles
Total Land uses
0.254
0.383
VMT
0.277
Mean BLOS
-0.041
Mean PVT Cond
Mean PCT HV
0.088
-0.049
Mean # Lanes
-0.019
Bike Lane Miles
-0.028
Av Building Area
-0.008
# Building Units
0.448
Av # Stories
0.412
Av Current Total Asses
-0.052
Owner Occ
Renter Occ
0.120
0.483
HH Size
0.385
Males
0.468
Females
0.513
Total Pop
0.492
White Pop
0.146
Black Pop
0.376
Children Under 5
0.534
Total # Schools
0.405
Table 4, Correlation Coefficients
Social
Gas/Stores
Schools
Proportion White
Proportion Male
Proportion Female
Pop under 5
Pop 5-17
Pop 17-65
Household Size
Land-use
Transportation
Bus Routes (miles)
Vehicle Miles Traveled
Ave Pavement Condition
Controlled Intersections
Economic
Median Household Income
Number of Building Stories
Ave Lot Area
Renter Occupied
Owner Occupied
Table 5, explanatory variables
4.3.1 Spatially Lagged Regression Analysis
Spatial lag dependence in regression includes a spatially autoregressive term for the
dependent variable in the form below. This form takes into account the mean of the adjacent
spatial locations of the neighbors (Gamerman, et.al., 2004). Moreover, each neighbor has eight
neighbors, with spatially shifted variables. In Geoda, Maximum likelihood estimation is utilized
to account for the Wy term. The spatial lag model is especially useful when utilizing
administrative aerial data. When data is aggregated per zone, as in this study, loss of information
will occur and an autoregressive model may be able to account for this spatial mis-match
producing a better fitting model (Anselin and Bera, 1998).
By inputting the spatial weights matrix, the output from Geoda includes spatial error and
spatial lag terms to assist with further spatial regression model development (table 6). Included
in the Lagrange multiplier test statistics are: LM-Lag, Robust LM-Lag, LM-Error, Robust LMError, and LM-SARMA. Table 6 indicates that the Lagrange multiplier spatial lag coefficient is
significant, more so than the spatial error coefficient. The results of the linear OLS regression
and spatial diagnostic output allude to the potential prediction model output (table 6). While the
LM-Error and LM-Lag are both highly significant, the Robust LM-Lag is slightly more
significant than the Robust LM-Error. The robust LM and p values should be assessed when the
standard LM outputs are equally significant, in which this is the case. As a result, table 6
indicates that the p-value of the Robust LM-Error is more significant than the Robust form of the
LM-Error and therefore, the LM-Lag model was utilized to increase the predictive capability of
the model.
Diagnostics For Spatial Dependence
Test
MI/DF VALUE
Moran's I (error)
0.3997
8.2394
Lagrange Multiplier (lag)
1
60.8275
Robust LM (lag)
1
2.4057
Lagrange Multiplier (error) 1
59.7792
Robust LM (error)
1
1.3574
Lagrange Multiplier (SARMA) 2
62.1850
PROB
0.0000
0.0000
0.1208
0.0000
0.2439
0.0000
Table 6, Spatial Lag Diagnostics
5. Results and Discussion
The exploratory analysis of the bicycle accidents reveal that greater than 50% of the
bicycle crashes occur in the starting in the afternoon from 12:00 p.m. and diminishing around
10:00 p.m. (figure 3). In addition the greatest amount of bicycle crashes occurs on Tuesdays and
to a lesser degree on Wednesdays and Fridays. On the days that most of the accidents happen,
most involve and angle or no collision. Only 9.5% of all crashes take place on Saturdays. In
addition, 61.33% of the accidents take place at intersections and 38.66% at non-intersections.
We can infer from these results that the pattern of cyclist crash occur most frequently during the
mid-afternoon, near or at intersections, and during weekdays. This pattern could be attributable
to the increase in bicycling commuting and increase in automobile exposure during those times
and days.
Total Injured and Time of Day
140
120
100
80
injuries
60
40
20
0
12-1 1-2
AM AM
2-3
AM
3-4
AM
4-5
AM
5-6
AM
6-7
AM
7-8
AM
8-9 9-10 10AM AM 11
AM
11- 12-1 1-2
12 PM PM
AM
2-3 3-4 4-5
PM PM PM
5-6 6-7
PM PM
7-8 8-9 9-10 10- 11PM PM PM 11 12
PM PM
Weekly Crash Type
Total Injure d
180
160
140
120
100
80
60
40
S
F
20
R
W
0
T
Total
Injury
Angle
Collision
M
No
Head On
Off Road
Collision
S
Rear
End
Sideswip
Unknown
e
Type of Accident
Figure 3 Exploratory Analyses of Bicycle Crashes
Daily and Hourly Bicycle Incidents
Incidents
25
20
15
10
5
AM
AM
1 AM
11-1
2 AM
12-1
PM
1-2
PM
2-3
PM
3-4
PM
4-5
PM
5-6
PM
6-7
PM
7-8
PM
8-9
PM
9-10
PM
10-1
1 PM
11-1
2 PM
10-1
9-10
AM
AM
8-9
7-8
AM
6-7
AM
AM
5-6
4-5
AM
AM
3-4
2-3
S
1-2
W T
M
12-1
SF
R
AM
0
Figure 3 cont., Daily and Hourly Crash Rate
The Nearest Neighbor Analysis indicates that spatial clustering is evident among the
bicycle accidents with a derived R value of .68. Furthermore, the NNH analysis indicates that
the bicycle accidents are grouped into 2 zones of clusters (figure 5). The hierarchical clusters
have grouped accidents that are comprised of high residential and commercial land-use. We can
infer that the bicycle accident clusters are spatially grouped according to certain environmental
and/or demographic characteristics. The standard deviational eclipse depicts a northwest to
southwest cluster orientation. The eclipse covers an area of the city that contains a high density
residential and lower income population. Utilizing Geoda, a weighted Moran’s I spatial
autocorrelation measure was conducted on the neighborhood units to determine if there was local
autocorrelation among units and bicycle crash rates (figure 6). A local Moran’s I of crash rates
per neighborhood revealed a value of and 0.485 indicates that density of accidents per
neighborhood is clustered at the .01 significance level (figure 6). We can infer that underlying
environmental effects may be occurring to produce this second order clustering and warranted
further analysis. More importantly, the null hypothesis that the crash density is random is false.
Interestingly enough, the zonal based autocorrelation measure mimics the NNH results.
Neighborhood crash rates are clustered over high density residential and commercial areas of the
city. Figure 6 indicates that low-low clusters are apparent in the far northwest and southwest
portions of the city. This low-low zones indicate that neighborhoods are clustered by having
very low crash rates. Referring to figure 2, it is evident that the bicycle density per
neighborhood is autocorrelated to other neighborhood units, justifying further diagnostic
measures.
Figure 4, Standard Deviational Eclipse Cluster
Figure 5, Hierarchical Nearest Neighbor Clusters
Figure 6, Local Moran’s I of Neighborhood Crash Rate
The OLS results are depicted in tables and figure 7. Table 7 indicates that the adjusted R
squared is relatively strong. On the other hand, figure 7 indicates that spatial autocorrelation of
the residuals may be influencing the result. The residual map is a smoothed map whereas the all
outside variables have been removed from the analysis. The residuals maps allude to a system
wide over or under prediction in certain areas. The quantile and standard deviation map
indicates that like valued crash rates per neighborhood are in similar regions. Moreover, the
standard deviation map in figure 7 indicates extreme values and may be indicating over and
under prediction of crash rates and/or outliers. Figure 8 indicates that the residuals are
autocorrelated as evidenced by the number of neighborhoods with high-high and low-low values.
The Moran’s I of .213 also alludes to clustering within this model.
The resulting R squared may be due to spatial dependence of neighborhood units, or
selection of explanatory variables. The multicollinearity condition number output from Geoda is
71.78 and is indicative of a problem with highly correlated variables. The coefficients and test
for significance in the OLS model indicate that average household size and the total owner
occupied housing are positively related to crash rates per neighborhood (table 8). The proportion
of females in each neighborhood is negatively related to crash rates. We can then postulate that
the sex, or proportion of females is an inhibiting factor to bicycle crashes. Furthermore, we can
infer that the density and is positively related to the crash rates per neighborhood, but the
robustness of the OLS model precludes any further estimation.
Ordinary Least Squares Regression
R-squared:
0.514346
Adjusted R-squared:
0.463225
F-statistic:
10.0613
Sum squared residual: 16771.6
Log likelihood:
-695.238
Akaike info criterion:
1428.48
S.E. of regression:
9.90351
Schwarz criterion:
1490.17
Multicollinearity Condition
71.78509
Table 7, OLS Results
Quantile Map of OLS
Residuals
Standard Deviation Map of OLS Residuals
Figure 7, OLS Regression
results
Figure 8, Moran’s I plot of the residuals resulting from the OLS model
Variable
Med HH Income
Total # Gas/Stores
School Sum
Mean Pvt Cond.
Total VMT
Land-use Sum
Total Bus Rts
Controlled Inter
Ave HH Size
Prop of White
Prop of Male
Prop of Female
Age under 5
Age 5-17
Age 17-65
Sum OO
Sum RO
Ave # Stories
Ave Lot Area
Coefficient
-6.47678
2.07462
0.1920465
3.791276
-1.15E-05
0.0882798
-0.02899316
0.05419747
5.124003
-0.1133666
0.1593905
-0.3538739
-0.03281472
0.02986987
-0.005905032
0.004316509
0.9734275
-1.19E-05
0.006845071
Std.Error
9.653116
0.9726252
0.4703181
4.063609
4.47E-05
0.1277889
0.7638854
0.2352698
1.691862
0.04703466
0.1601151
0.1311
0.4073672
0.2058375
0.001903957
0.001278005
0.3315593
9.94E-06
0.1143437
t-Statistic
-0.6709523
2.133011
0.4083331
0.9329824
-0.2576938
0.6908255
-0.03795486
0.2303631
3.028617
-2.410279
0.9954749
-2.699268
-0.08055317
0.1451138
-3.101453
3.377536
2.935907
-1.199044
0.05986403
Probability
0.503156
0.0343505
0.6835393
0.3521446
0.796949
0.4906112
0.969804
0.8180838
0.0028377
0.0170007
0.3209125
0.007647
0.9358719
0.8847863
0.0022531
0.0009056
0.0037835
0.2321692
0.9523066
Table 8, OLS Predictors and Coefficients
The spatial lagged model was conducted using the Rook Contiguity weight matrix. As
table 2 indicates, the pseudo R squared has increased from the OLS result. Conversely, the R
squared is not a true test of spatial regression robustness (Anselin, 2005). The log likelihood,
which is a better way to judge the robustness of a SAR model, has increased from the OLS
model. The log likelihood increased from -751.629 (OLS) to -677.222, Akaike criterion
decreased from 1509.26 to 1394.44, and the Schwarz criterion decreased from 1519 to 1459. As
a result, the improvement over the OLS spatial model has substantially increased.
Figure 9 indicates a map of the standard deviation of residual values from both the SAR
and OLS model. The comparison map indicates that under and over prediction has been reduced
by observing the decrease in extreme negative and positive values in the SAR std dev. Map.
Moreover, the residuals were tested for autocorrelation because they represent the spatially
filtered model error term. The Moran’s I test statistic for the residuals is: -.030, which indicates
no clustering, but outliers or model misspecification may still be a problem due to existing highhigh and low-low values. This Moran’s I value is expected due to the elimination of variables
outside of the model and that the autoregressive term has removed spatial autocorrelation among
neighborhoods. With the autocorrelation minimized, erroneous variables can be removed and
further solidify the coefficients and predicated values among the crash rate and predictor
variables from the spatial regression model
The explanatory variables that are depicted in table 9 reveal the importance of significant
predictions of bicycle crash rates per neighborhood. As evidenced in table 9, average household
size is the most positively predictor of crash rates at the neighborhood level. On the other hand,
the total owner occupied housing negatively influences crash rates. In other words, the housing
ownership type is associated with decreasing crash rates. We can infer from the significance of
the average household size and owner occupancy sums that household density influences the
increase in crash rate and that the occupancy of housing has little to do with crash rates. The
positive coefficient (z-value) of average household size reveals that this variable contributes
most to expected crash rate per neighborhood. On the other hand, the summation of total renter
occupied housing is also significant and positively related to crash rates. This can be an indirect
indicator of demographic and lifestyle choice in that renter may have more drive to cycle and
thereby positively influence the crash rate per neighborhood. The next significant factor in crash
rates is the total number of gasoline/stores. This factor is significant and is positively related to
increases in bicycle crash rates. The increase in gasoline/stores can be an indicator of increases
automobile needs and commercial activity of the neighborhood. The average number of stories
of all buildings in each neighborhood also has a positive effect on crash rate, but does contribute
to the observed crash rate. This independent variable is an indicate of housing type and
residential density. As a result, we can deduce that as the density of commercial, residential
activity increases, so does the crash rate. Surprisingly, roadway and traffic characteristics were
not significant contributors to crash rate dependence.
Spatial Lag Regression Output
Mean dependent var : 10.1696
S.D. dependent var :
13.4818
R-squared :
0.620526
Log likelihood:
-677.222
Akaike info criterion : 1394.44
Schwarz criterion:
1459.38
S.E of regression :
8.30496
Table 8, SAR results
Spatial Lag Model Results
R-squared :
0.620526
Log likelihood:
- 677.222
Schwarz criterion:
1459.38
Akaike info criterion : 1394.44
OLS Model Results
Adjusted R-squared:
Log likelihood:
Schwarz criterion:
Akaike info criterion :
Table 9, Spatial Lag vs. OLS model results
0.463
-695.238
1490.17
1428.48
LAG Model Residuals
OLS Residuals
< -2.50 Std. Dev.
< -2.50 Std. Dev.
-2.50 - -1.50 Std. Dev.
-2.50 - -1.50 Std. Dev.
-1.50 - -0.50 Std. Dev.
-1.50 - -0.50 Std. Dev.
-0.50 - 0.50 Std. Dev.
-0.50 - 0.50 Std. Dev.
0.50 - 1.50 Std. Dev.
0.50 - 1.50 Std. Dev.
> 1.50 Std. Dev.
> 1.50 Std. Dev.
Figure 9, Residual comparison between SAR and OLS models
Figure 10, plot of SAR Lag model residuals
Variable
Med HH Income
Total #
Gas/Stores
School Sum
Mean Pvt Cond.
Total VMT
Land-use Sum
Total Bus Rts
Controlled Inter
Ave HH Size
Prop of White
Prop of Male
Prop of Female
Age under 5
Age 5-17
Age 17-65
Sum OO
Sum RO
Ave # Stories
Ave Lot Area
Coefficient
-9.408525
Std.Error
8.094988
z-value
-1.162265
2.314405
0.8168476
2.833338
0.0318974
0.3944083
0.08087406
2.725468
3.427343
0.7952132
1.60E-05
3.75E-05
0.4252056
0.0378078
0.1071675
0.3527916
0.05487931
0.6413104
0.0855737
0.2182366
0.1975819
1.104538
6.401072
1.419053
4.510804
-0.05174428 0.04130239
-1.252816
0.1010094
0.1342808
0.752225
-0.2549107
0.110433
-2.308284
0.0954602
0.3427642
0.2785011
-0.08995352
0.172637
-0.5210558
-0.02389913 0.09669299
-0.247165
-0.0059584 0.001597544 -3.729725
0.002606335 0.001081849
2.409149
0.5958942
0.2806523
2.123247
-6.18E-06
8.34E-06
-0.740882
Probability
0.2451278
0.0046066
0.935542
0.4264894
0.6706869
0.7242448
0.9318052
0.2693601
0.0000065
0.2102729
0.4519156
0.0209833
0.7806278
0.6023279
0.8047806
0.0001917
0.0159898
0.033733
0.4587649
Table 8, General Predictors of Neighborhood
Crash Rates in the SAR model
6.0 Conclusion
Past researchers have acknowledged the relationship between roadway, demographic, and
land-use variables in crash rate research. Along those same lines, I have also incorporated
roadway characteristics, population, and land-use data to assess the spatial relationship between
accident densities within neighborhoods. In this spatial analysis review of bicycle accident
density, the hypothesis of complete spatial randomness is null. The first and second order
pattern analysis has alluded to the fact that bicycle accident points and densities are
autocorrelated. Therefore, the predictive model chosen in this paper has corroborated this
finding.
It has been alluded to in this paper that a substantial temporal and spatial exists among
accident locations and densities. Cluster analysis signified substantial autocorrelation that
needed to be relieved in order to formulate a robust prediction model. Furthermore, in Geoda,
the incorporation of a spatially lagged explanatory variable improved the predictive model. The
increase in log likelihood, Akaike info criterion, and Schwarz criterion from the OLS model
indicates the improvement over the OLS model and at the same time, indicates flaws as well.
The results indicate that a spatial relationship exits between bicycle accident density per
neighborhood and population density. Interestingly enough, roadway characteristics were
neither correlated nor contributed to crash rate prediction. Further analysis at the network level
may supersede this result and reveal, as in past studies, which roadway geometries and traffic
contribute to bicycle crash rates. Model misspecification may have hindered the model results
due to multicollinearity among explanatory variables. Future research should include the
incorporation of non-redundant variables. The aggregation of variables opens the study to the
issue of the modifiable aerial unit problem. Rather than aggregating the data per neighborhood,
a prediction and autocorrelation measure using the crash points per road segment might have
been a more appropriate model. In addition, crash density may not be a true indicator of a
bicycle accident problem. Without knowing the exposure of bicyclists to roadways, we do not
know what risk the roads actually play in accident rates per bicyclist. In other words, a common
denominator is needed that is directly linked to rider exposure.
Overall, this study has shown that Geoda has been a useful tool in addressing the spatial
dependence of zonal data on bicycle crash density. The utility of this model lies in the fact that it
replicates the trip generation step of the UTPS. In this case we utilized commonly know bicycle
crash rate generators to predict density per zone, much like the trip generation step of the 4-step
travel demand model. Therefore, this study has indicated a methodology that could easily be
assimilated into the common 4-step travel demand model. The zonal based approach utilized
neighborhoods and thus provided useful information for neighborhood planning approaches or to
incorporate in indicator studies. As evidenced in this study, an improvement to the OLS model
is apparent in the residual values plot and coefficients resulting from the spatial lag model of
police reported bicycle accidents. Although the foundation of the model can be improved upon,
accounting for spatial dependence and predicting bicycle crash density using on an
autoregressive model may prove its utility in further research.
Appendix 1
BLOS = 0.507 ln(Vol15/Ln) + 0.199 SPt(1+10.38HV)2 + 7.066(1/PR5)2 – 0.005 We2 + 0.760
where:
Vol15 = volume of directional traffic in 15 minutes = (ADT * D * Kd) / (4 * PHF)
ADT = Average Daily Traffic on the segment
D = Directional Factor
Kd = Peak to Daily Factor
PHF = Peak Hour Factor
Ln = number of directional through lanes
SPt = effective speed limit = 1.1199 ln(SPp-20) + 0.8103, where SPp is the posted speed limit
HV = percentage of heavy vehicles (as defined in the 1994 Highway Capacity Manual)
PR5 = FHWA’s 5-point pavement surface condition rating (5=best)
We = average effective width of outside through lane:
We = Wv – (10’ * OSPA) where Wl = 0
We = Wv + Wl (1 – 2 * OSPA) where Wl > 0 & Wps = 0
We = Wv + Wl – 2 (10’ * OSPA) where Wl > 0, Wps > 0, and a bike lane exists.
OSPA = fraction of segment with occupied on-street parking
Wt = total width of outside lane (and shoulder) pavement
Wl = width of paving between outside lane stripe and edge of pavement
Wps = width of pavement striped for on-street parking
Wv = effective width as a function of traffic volume
Wv = Wt if ADT>4000 veh/day
Wv = Wt (2 – (ADT/4000)) if ADT<4000 and road is undivided and unstriped.
Bicycle Level of Service ranges associated with level of service (LOS) designations:
BLOS Score Range 1.501.51-2.50 = B, 2.51-3.50 = C, 3.51-4.50 = D 4.51-5.50 = E >5.50= F
Appendix 2
y  Wy  X  
Wy is the spatially lagged variable for weights matrix W
y is an N by 1 vector of observations on the dep variable
X is an N by K matrix of observations on the explanatory variables
 is an N by 1 vector of error terms
 is the spatial autoregressive parameter,
 is a K by 1 vector of regression coefficients.
Bibliography
Anselin, L. (2005). Exploring Spatial Data with Geoda : A Workbook. Spatial Analysis
Laboratory (SAL). Department of Agricultural and Consumer Economics, University of
Illinois, Urbana-Champaign, IL.
Anselin, L. and Bera, A. (1998). Spatial Dependence in linear regression models with an
introduction to spatial econometrics. In Ullah, A. and Giles, D.E., editors, Handbook of
Applied Economic Statistics, pages 237-289. Marcel Dekker, New York.
Beimborn, E., Kennedy, R. (2000?). “ Inside the Black Box: Making Transportation Models
Work for Livable Communities”.
Black, W.R. (2003). Transportation: A Geographical Analysis, 1st edn. New York:
Guilford.
Burt, J.E., Barber, G.M., (1996). Elementary Statistics for Geographers. Guilford Press, New
York, New York.
Carpenter, T.E., (2001). “Methods to Investigate Spatial and Temporal Clustering in
Veterinary Epidemiology.” Preventative Veterinary Medicine, 48, 303-320.
Feske, D., (1994). “Life in the Bike Lane.” American City and County., 109, 64-77.
Flahaut, B., (2004). “Impact of Infrastructure and Local Environment on Road Unsafety
Logistic Modeling with Spatial Autocorrelation.” Accident Analysis & Prevention,
36, 1055-1066.
Fotheringham, A.S., Brunsdon, S., Charlton, M., (2004) Quantitative Geography, 3rd edn.
London: Sage Publications.
Garder, P., (1994). “Bicycle Accidents in Maine: An Analysis.” Transp. Res. Rec., 1438,
Transportation Research Board, Washington, D.C., 34-41.
Gamerman, D., and Moreira, A.R.B., (2004)., “Multivariate Spatial Regression Models,”
Journal of Multivariate Analysis., 91, 262-281.
Landis, B.W. (1996a). “Bicycle System Performance Measures.”. ITE., 66, 18-23.
Landis, B.W., Vattikuti, R., Brannick, M., (1996b). “Real-Time Human Perceptions
Toward a Bicycle Level of Service.” Transportation Research Record., 1578, 119-126.
Levine, N., Kim, K.E., & Nitz, L.H., (1995). “Spatial Analysis of Honolulu Motor Vehicle
Crashes: II. Zonal Generators.” Accident Analysis and Prevention., 27(5), 675-685.
O’Sulliven, D., and Unwin, D.J. (2003). Geographic Information Analysis. John Wiley and
Sons, Inc. New Jersey.
Pawlovich, M.D., Souleyrette, R.R., Strauss, T., (1998). “A Methodology for Studying Crash
Dependence on Demographic and Socioeconomic Data.” Transportation Conference
Proceedings., Center for Transportation Research and Education.
Rogerson, P.A. (2001). Statistical Methods for Geography, Sage Publications: London.
Rodgers, G.B., (1997). “Factors Associated with the Crash Risk of Adult Bicyclists.”
Journal of Safety Research, 28, 233-41.
Steenberghen, T., Dufays, T., Thomas, I., Flahaut, B. (2004). “Intra-Urban Location and
Clustering of Road Accidents using GIS: A Belgian Example.” International
Journal of Geographical Information Science. 18(2), 169-181.
Wachtel, A., Lewiston, D., (1994). “Risk Factors for Bicycle-Motor Vehicle Collisions at
Intersections”, ITE., 64,(9) 30-35.
Download