The Role of Urban Scale, Transportation, and Demographics in Cycling Risk Greg Rybarczyk Department of Geography Urban Planning 771 Spring 2006 1. Introduction In the United States, the ever present automobile driven culture has successfully designed roadways to move gasoline powered vehicles from point A to point B as efficiently as possible. As a result, the ominous auto-dependent transportation network has contributed to environmental degradation, fragmented neighborhoods, and overall, contributes to the deterrence of a bicycle mode of transportation (Garder, 1994). Moreover, safety concerns, undesirable bicycle routes, and un-connected routes have contributed to reduced bicycle transportation. The rapid increase in the use of bicycles in the past decade has been well documented. As one would expect, this increased usage has led to a parallel rise in the number of bicycle accidents. Furthermore, a common failing of most city planning is that bicycle route planning is ad-hoc and fails to allocate routes that are not in conflict with cars (Hanson, 1995). There are approximately 38 bicycle related fatalities per billion passenger-kilometers. In 1994 alone 722 bicyclists were killed in the United States (Black, 2003 and Wachtel et. al., 1994). In spite of federal legislation that has been centered on increasing funding for alternative modes of transportation, approximately 650,000 people nationwide are treated for bicycle related injuries each year. While federal acts such as the Intermodal Surface Transportation Efficiency Act (ISTEA) and the Transportation Equity Act (TEA-21) has done much in the way in reforming U.S. policy and implement bicycle infrastructures, un-safe bicycling environments continue to rise. Unfortunately, most other federal policies still cater to the automobile and do little to improve the safety of bicycling. At the same time, bicycling is becoming ever more popular among commuters and recreational riders (Rodgers, 1997). The question becomes: how can policy makers satisfy federal mandates to increase bicycling while minimizing bicycling risk. Therefore, a sizable amount of literature has been published that has looked a bicycle accident patterns, but few have investigated what factors contribute to bicycle accident rates in zones while incorporating only neighborhood level data. The structure of the paper is as follows. Section 2 consists of a contextual description of the objectives of the study. In addition, the feasibility of this research is also highlighted in respect to past attempts in this area of research. Section 3 describes the study area and data preprocessing. Section 4 describes the analytical approach used to investigate bicycle crashes in the study area and include: exploratory crash data analysis, cluster analysis, and spatial autoregressive regression analysis. Section 5 consists of the comparisons and results of the modeling procedure. Finally we conclude in section 6. 2. Research Objective: In order to produce an environment that will encourage the utilization of bicycling as a viable transportation option, it must be perceived to be safe. There have been many studies that have utilized bicycle traffic accidents to model roadways for planning and safety (Pawlovich et. al. 1998 and Garder, 1994). One way to assess bicycle travel safety is to understand what factors relate to incidences of bicycle crashes. While there have been many studies that relate bicycle crashes to roadway features, few have assessed them within and among zones. Moreover, bicycle accident data analysis at the neighborhood level is a logical choice to include for future policy action, identify problems, measure progress, and develop potential countermeasures. The trip distribution step of the well known Urban Transportation Planning System (UTPS) is to predict traffic based on zonal attractions and generators and to delineate traffic between zones and then allocated to travel modes (Levine et. al., 1995). Therefore, this same approach can be utilized to predict crashes among zones based on neighborhood traffic, land-use, and demographic variables. If the prediction is accurate, these findings can then be interpolated to roadway features. This research will attempt to quantify variables that are integral to the increase in bicycling safety by controlling for neighborhood area and autocorrelation effects. The neighborhood unit of assessment was used because accidents are related to local characteristics of which they occur and are not a road specific phenomenon. Therefore, the neighborhood composition is then the optimum way to explain higher levels of intra and inter modal traffic, land-use, and demographics. The combination of neighborhood physical road attributes, social-economic quality, and demographic data may uncover causal factors associated with bicycle crash rates. By incorporating non-roadway attributes, more informed decisions can then be made by the increasing neighborhood level planning initiatives. In addition, few attempts have been made that takes into account roadway and demographic variables in bicycle safety analysis (Pawlovich, 1998). Therefore, the goals of this study were to establish that bicycle crashes were not randomly dispersed and then develop a robust bicycle accident predictive model that minimizes adjacent neighborhood influences. In other words, this study accounts for the spatial variation between neighborhood land-use, socio-economic quality, and roadway attributes while minimizing autocorrelation among units. The objectives of this paper is then: to explore the problem of bicycle accidents in the City of Milwaukee and develop a model that identifies bicycle accident clusters and minimizes autocorrelation to reveal spatial dependence of bicycle accident rates related to land-use, density, and roadway infrastructure. Therefore, the hypothesis was that elevated bicycle accident rates are not random but related to an underlying comprehensive phenomenon that is particular to each neighborhood. In effect, the expected outcome is that the pattern of aggregated accident rates is not completely random, and that causal factors related to neighborhood demographics, land-use, and road characteristics can be used for crash rate prediction. 3. Data and Study area: The study area consisted of the City of Milwaukee located in Milwaukee County, Wisconsin (figure 1). The neighborhood boundaries were obtained and derived from the City of Milwaukee and includes 190 distinct polygons. The primary data source includes bicycle accident counts obtained from the Wisconsin Department of Transportation during the years 1999 through 2003. The City of Milwaukee traffic control data was also geocoded and included controlled and uncontrolled intersection data. Bus route data was obtained from the Milwaukee County Transit Service and include total length in miles of each route. The bicycle accident database contains all type of accidents including: location by address, manner of collision, number injured, time/day/month/year of accident, light conditions, roadway surface conditions. A total of 979 crashes occurred within the City of Milwaukee in from the years 1999 thru 2003. Population data was obtained from the U.S. Census Bureau SF-1 database. Land-use and urban scale data was obtained from the City of Milwaukee’s Milwaukee Property file (MPROP). Planimetric roadway data (WISLER and Fire DIME) was obtained from the Wisconsin Department of Transportation (WIDOT) and City of Milwaukee. § ¨¦ I-43 § ¨¦ I-94 § ¨¦ I-794 § ¨¦ I- 43 § ¨¦ I- 894 ¸ Figure 1, Milwaukee and its neighborhoods 3.1 Data Preparation The heterogeneous composition of the data used in this study required a rigorous data geoprocessing component. Through the use of ArcGIS 9.1, U.S. SF1 census block data was first extrapolated to the neighborhoods of Milwaukee and then aggregated. SF1 demographic data that was retained for further analysis included demographic and income attributes. Land-use, scale, and urban density attributes used in this study was derived from the MPROP database. The database contained information for each parcel in the study area. Vehicle miles traveled (VMT) was computed by this equation: total AADT * 365 days/year * roadway length (ft) / 5280 ft (per mile). In addition, a mean bicycle level of service (BLOS) was also computed for each neighborhood (appendix 1). The BLOS is an index developed by Landis (1996b) that attempts to quantify the suitability of a roadway segment for bicycling based on real-time human perceptions of safety. A bicycle crash rate (dependent variable) was established by dividing the number of crashes per neighborhood by the area (sq miles) and became the dependent variable in this research. This normalization was utilized to remove the effect of differing spatial size of the neighborhood. Further descriptions of neighborhood level data aggregation are displayed in table 1. Demographic Transportation # Gas/Stores # Schools % Proportion White % Proportion Male % Proportion Female % Pop under 5 % Pop 5-17 % Pop 17-65 Ave Household Size Total # Land-use Sum Bus Routes (miles) Sum Vehicle Miles Traveled Ave Bicycle Level of Service Ave Pavement Condition Sum Controlled Intersections Sum Uncontrolled Intersections % Heavy Truck Traffic Sum Interstate Miles Ave # Roadway Lanes Table 1, Neighborhood Data Economic Median Household Income Ave Building Stories (height) Ave Building Area Ave Lot Area % Renter Occupied % Owner Occupied 4.0 Methods Many studies have incorporated either infrastructure or environmental conditions to model bicycle accident relationships. The main goal of this study is to understand, through a classical linear regression moving forward to a spatial regression model, the spatial cooccurrence of bicycle accident concentrations per neighborhood. In other words, why do concentrations of accidents occur where they do? Figure 2 depicts the flow chart that utilized to derive a statistically significant model. A local cluster analysis was first utilized in order to verify deviation from complete spatial randomness of the accident points. Moran’s I was used to determine the degree that the accident points were autocorrelated and then a cluster analysis was conducted and included: Hierarchical Nearest Neighbor Analysis, Standard Deviational Eclipse, and Moran’s I cluster diagnostic test was utilized to present the accident black spots. Once clustering was observed, a model was developed to attribute possible variables to the clustering phenomenon. The neighborhood unit was chosen as the spatial unit of study because it eliminates the randomness that may appear in a study of random events and accounts for traffic spillover effects that would not be accounted for in the typical trip-distribution model (Flahaut, 2004 and Levine et.al. , 1997). The flow chart that guided this study is displayed in figure 2 and was adapted from Anselin (2005). Exploratory Data Analysis Cluster Analysis Calculate Local Moran’s I Run Spatial Error Model Run OLS Regression LM -Error and LM-Lag Either LM-Error or LM-Lag Significant? Both LM -Error and LM-Lag LM-Error One Significant LM-Lag Stop: Keep OLS Results Robust LM-Error and LM-Lag Results Robust LM-Error Significant? Run Spatial Error Model Run Spatial Lag Model Robust LM-Lag Results Run Spatial Lag Model Figure 2, methodology flowchart 4.1 Exploratory Data Analysis In conformance with the numerous past studies of bicycle crash analysis, this study has included various exploratory statistics into space-time and physical patterns of bicycle crashes. Given the complexity of the factors involved with bicycle accidents, a brief statistical picture is needed as a baseline to determine frequencies. Furthermore, the statistical figures associated with the accidents cannot allude to the actual bicycling rate to determine exposure because that data is not provided. As a result, a typology of crash occurrences in relation to time, day, and type of crash was developed. 4.2 Cluster Analysis Prior to spatial modeling, a cluster analysis data mining technique was conducted in order to verify any, if at all, clustering of like accident values among the accident points and crash rates per neighborhood. The 2 dimensional clustering techniques used in this study will allow for insight into potential underlying processes associated with environmental parameters, identify bicycle accident zones, and also guide model development. A first order point pattern analysis using the nearest neighbor function in Esri’s ArcGIS 9.1 and CrimeStat software was utilized also as exploratory tool. Nearest Neighbor analysis assesses all points and calculates the mean distance to the nearest neighbor, if this distance is less than expected, then clustering is evident (Carpenter, 2001). More specifically, a nearest neighbor hierarchical (NNH) analysis using CrimeStat was also conducted to determine first order clusters based on 10 points are evident. In NNH, a bicycle accident based on a hierarchical order whereas first order clusters are grouped first then fused to lower order groups until all points are clustered. Moreover, the procedure identifies first-order clusters, representing groups of points that are closer together than the threshold distance and in which there is at least the minimum number of points specified by the user until each cluster is filled. In addition, a standard deviation eclipse method was also utilized to measure the direction and concentration of bicycle crash points. The standard deviational ellipse defines both the orientation of the dispersion of point based clustering. Lastly, a Local Moran’s I was conducted in ArcGIS 9.1 and Geoda with neighborhood polygons to determine second order relationships. Moran’s I index has historically been used in polygon based autocorrelation measures due to its common application to aerial units, ease of use, and stable results (Steenberghen and Dufays, 2004, O’Sullivan and Unwin, 2004). A spatial weights file was created in Geoda and utilized to derive a polygon based Moran’s I. Because we are concerned with all forms of contiguity between Neighborhood units, a rook contiguity, 1st order weight matrix was created. A rook contiguity index was chosen because the goal was to produce and account for all neighbors. 4.3 Regression Analysis An empirical analysis of bicycle accidents would be incomplete without a correlation and causal determination. A correlation analysis was conducted in SPSS to test the independent variables for significance and colinearity (table 4). The Pearson’s R for each proposed variable is depicted in table 4. As a result, only 19 explanatory variables were utilized in the SAR model (table 5). After it was determined that clustering was significant in the crash points and the crash rates per neighborhood, a model was developed to determine contributing explanatory factors. Therefore, subsequent to cluster analysis, an identification of empirical relationships was uncovered via a classic and spatial regression hierarchy. Zonal based data may exhibit a spatial relationship with other zones. In other words, zonal data may have an inherent relationship with its neighbor. In order to account for this spatial dependency of neighborhood zones, their covariance’s matrix should be included in the prediction model. While a classical linear regression model may be able to determine factors that are linearly related to its variables, there is no accountability for spatial dependence. Therefore a more appropriate model accounts for autocorrelation. Unfortunately, if we introduce spatial error into an OLS model we derive inefficient coefficients. When a spatial lag term is introduced to OLS regression, independence and multicollinerity are violated. There are two types of spatial auto regressive approaches (SAR): spatial error and spatial lag. Spatial error consists of correlations across space in the error term and spatial lag is where the dependent variable is affected by the location of independent variable in space “i” and independent variable in space “j” (Anselin and Bera, 1998). Autocorrelation can be modeled by considering the correlation among error terms (spatial error) or by modeling the spatial trend in the error terms (spatial lag), (Fotheringham, et. al. 2004). Furthermore, by utilizing a SAR model in this study, the mean of the neighboring crash densities is used as another explanatory variable. In this study, Geoda was utilized to proceed with a spatially lagged or error regression model. Pearson’s Correlation Av HH Size 0.291 Uncontrolled Intersection -0.003 Controlled Intersection 0.310 Stop Signs Age 17-64 0.481 0.474 Age 5-17 0.498 Total Bus Miles Total Land uses 0.254 0.383 VMT 0.277 Mean BLOS -0.041 Mean PVT Cond Mean PCT HV 0.088 -0.049 Mean # Lanes -0.019 Bike Lane Miles -0.028 Av Building Area -0.008 # Building Units 0.448 Av # Stories 0.412 Av Current Total Asses -0.052 Owner Occ Renter Occ 0.120 0.483 HH Size 0.385 Males 0.468 Females 0.513 Total Pop 0.492 White Pop 0.146 Black Pop 0.376 Children Under 5 0.534 Total # Schools 0.405 Table 4, Correlation Coefficients Social Gas/Stores Schools Proportion White Proportion Male Proportion Female Pop under 5 Pop 5-17 Pop 17-65 Household Size Land-use Transportation Bus Routes (miles) Vehicle Miles Traveled Ave Pavement Condition Controlled Intersections Economic Median Household Income Number of Building Stories Ave Lot Area Renter Occupied Owner Occupied Table 5, explanatory variables 4.3.1 Spatially Lagged Regression Analysis Spatial lag dependence in regression includes a spatially autoregressive term for the dependent variable in the form below. This form takes into account the mean of the adjacent spatial locations of the neighbors (Gamerman, et.al., 2004). Moreover, each neighbor has eight neighbors, with spatially shifted variables. In Geoda, Maximum likelihood estimation is utilized to account for the Wy term. The spatial lag model is especially useful when utilizing administrative aerial data. When data is aggregated per zone, as in this study, loss of information will occur and an autoregressive model may be able to account for this spatial mis-match producing a better fitting model (Anselin and Bera, 1998). By inputting the spatial weights matrix, the output from Geoda includes spatial error and spatial lag terms to assist with further spatial regression model development (table 6). Included in the Lagrange multiplier test statistics are: LM-Lag, Robust LM-Lag, LM-Error, Robust LMError, and LM-SARMA. Table 6 indicates that the Lagrange multiplier spatial lag coefficient is significant, more so than the spatial error coefficient. The results of the linear OLS regression and spatial diagnostic output allude to the potential prediction model output (table 6). While the LM-Error and LM-Lag are both highly significant, the Robust LM-Lag is slightly more significant than the Robust LM-Error. The robust LM and p values should be assessed when the standard LM outputs are equally significant, in which this is the case. As a result, table 6 indicates that the p-value of the Robust LM-Error is more significant than the Robust form of the LM-Error and therefore, the LM-Lag model was utilized to increase the predictive capability of the model. Diagnostics For Spatial Dependence Test MI/DF VALUE Moran's I (error) 0.3997 8.2394 Lagrange Multiplier (lag) 1 60.8275 Robust LM (lag) 1 2.4057 Lagrange Multiplier (error) 1 59.7792 Robust LM (error) 1 1.3574 Lagrange Multiplier (SARMA) 2 62.1850 PROB 0.0000 0.0000 0.1208 0.0000 0.2439 0.0000 Table 6, Spatial Lag Diagnostics 5. Results and Discussion The exploratory analysis of the bicycle accidents reveal that greater than 50% of the bicycle crashes occur in the starting in the afternoon from 12:00 p.m. and diminishing around 10:00 p.m. (figure 3). In addition the greatest amount of bicycle crashes occurs on Tuesdays and to a lesser degree on Wednesdays and Fridays. On the days that most of the accidents happen, most involve and angle or no collision. Only 9.5% of all crashes take place on Saturdays. In addition, 61.33% of the accidents take place at intersections and 38.66% at non-intersections. We can infer from these results that the pattern of cyclist crash occur most frequently during the mid-afternoon, near or at intersections, and during weekdays. This pattern could be attributable to the increase in bicycling commuting and increase in automobile exposure during those times and days. Total Injured and Time of Day 140 120 100 80 injuries 60 40 20 0 12-1 1-2 AM AM 2-3 AM 3-4 AM 4-5 AM 5-6 AM 6-7 AM 7-8 AM 8-9 9-10 10AM AM 11 AM 11- 12-1 1-2 12 PM PM AM 2-3 3-4 4-5 PM PM PM 5-6 6-7 PM PM 7-8 8-9 9-10 10- 11PM PM PM 11 12 PM PM Weekly Crash Type Total Injure d 180 160 140 120 100 80 60 40 S F 20 R W 0 T Total Injury Angle Collision M No Head On Off Road Collision S Rear End Sideswip Unknown e Type of Accident Figure 3 Exploratory Analyses of Bicycle Crashes Daily and Hourly Bicycle Incidents Incidents 25 20 15 10 5 AM AM 1 AM 11-1 2 AM 12-1 PM 1-2 PM 2-3 PM 3-4 PM 4-5 PM 5-6 PM 6-7 PM 7-8 PM 8-9 PM 9-10 PM 10-1 1 PM 11-1 2 PM 10-1 9-10 AM AM 8-9 7-8 AM 6-7 AM AM 5-6 4-5 AM AM 3-4 2-3 S 1-2 W T M 12-1 SF R AM 0 Figure 3 cont., Daily and Hourly Crash Rate The Nearest Neighbor Analysis indicates that spatial clustering is evident among the bicycle accidents with a derived R value of .68. Furthermore, the NNH analysis indicates that the bicycle accidents are grouped into 2 zones of clusters (figure 5). The hierarchical clusters have grouped accidents that are comprised of high residential and commercial land-use. We can infer that the bicycle accident clusters are spatially grouped according to certain environmental and/or demographic characteristics. The standard deviational eclipse depicts a northwest to southwest cluster orientation. The eclipse covers an area of the city that contains a high density residential and lower income population. Utilizing Geoda, a weighted Moran’s I spatial autocorrelation measure was conducted on the neighborhood units to determine if there was local autocorrelation among units and bicycle crash rates (figure 6). A local Moran’s I of crash rates per neighborhood revealed a value of and 0.485 indicates that density of accidents per neighborhood is clustered at the .01 significance level (figure 6). We can infer that underlying environmental effects may be occurring to produce this second order clustering and warranted further analysis. More importantly, the null hypothesis that the crash density is random is false. Interestingly enough, the zonal based autocorrelation measure mimics the NNH results. Neighborhood crash rates are clustered over high density residential and commercial areas of the city. Figure 6 indicates that low-low clusters are apparent in the far northwest and southwest portions of the city. This low-low zones indicate that neighborhoods are clustered by having very low crash rates. Referring to figure 2, it is evident that the bicycle density per neighborhood is autocorrelated to other neighborhood units, justifying further diagnostic measures. Figure 4, Standard Deviational Eclipse Cluster Figure 5, Hierarchical Nearest Neighbor Clusters Figure 6, Local Moran’s I of Neighborhood Crash Rate The OLS results are depicted in tables and figure 7. Table 7 indicates that the adjusted R squared is relatively strong. On the other hand, figure 7 indicates that spatial autocorrelation of the residuals may be influencing the result. The residual map is a smoothed map whereas the all outside variables have been removed from the analysis. The residuals maps allude to a system wide over or under prediction in certain areas. The quantile and standard deviation map indicates that like valued crash rates per neighborhood are in similar regions. Moreover, the standard deviation map in figure 7 indicates extreme values and may be indicating over and under prediction of crash rates and/or outliers. Figure 8 indicates that the residuals are autocorrelated as evidenced by the number of neighborhoods with high-high and low-low values. The Moran’s I of .213 also alludes to clustering within this model. The resulting R squared may be due to spatial dependence of neighborhood units, or selection of explanatory variables. The multicollinearity condition number output from Geoda is 71.78 and is indicative of a problem with highly correlated variables. The coefficients and test for significance in the OLS model indicate that average household size and the total owner occupied housing are positively related to crash rates per neighborhood (table 8). The proportion of females in each neighborhood is negatively related to crash rates. We can then postulate that the sex, or proportion of females is an inhibiting factor to bicycle crashes. Furthermore, we can infer that the density and is positively related to the crash rates per neighborhood, but the robustness of the OLS model precludes any further estimation. Ordinary Least Squares Regression R-squared: 0.514346 Adjusted R-squared: 0.463225 F-statistic: 10.0613 Sum squared residual: 16771.6 Log likelihood: -695.238 Akaike info criterion: 1428.48 S.E. of regression: 9.90351 Schwarz criterion: 1490.17 Multicollinearity Condition 71.78509 Table 7, OLS Results Quantile Map of OLS Residuals Standard Deviation Map of OLS Residuals Figure 7, OLS Regression results Figure 8, Moran’s I plot of the residuals resulting from the OLS model Variable Med HH Income Total # Gas/Stores School Sum Mean Pvt Cond. Total VMT Land-use Sum Total Bus Rts Controlled Inter Ave HH Size Prop of White Prop of Male Prop of Female Age under 5 Age 5-17 Age 17-65 Sum OO Sum RO Ave # Stories Ave Lot Area Coefficient -6.47678 2.07462 0.1920465 3.791276 -1.15E-05 0.0882798 -0.02899316 0.05419747 5.124003 -0.1133666 0.1593905 -0.3538739 -0.03281472 0.02986987 -0.005905032 0.004316509 0.9734275 -1.19E-05 0.006845071 Std.Error 9.653116 0.9726252 0.4703181 4.063609 4.47E-05 0.1277889 0.7638854 0.2352698 1.691862 0.04703466 0.1601151 0.1311 0.4073672 0.2058375 0.001903957 0.001278005 0.3315593 9.94E-06 0.1143437 t-Statistic -0.6709523 2.133011 0.4083331 0.9329824 -0.2576938 0.6908255 -0.03795486 0.2303631 3.028617 -2.410279 0.9954749 -2.699268 -0.08055317 0.1451138 -3.101453 3.377536 2.935907 -1.199044 0.05986403 Probability 0.503156 0.0343505 0.6835393 0.3521446 0.796949 0.4906112 0.969804 0.8180838 0.0028377 0.0170007 0.3209125 0.007647 0.9358719 0.8847863 0.0022531 0.0009056 0.0037835 0.2321692 0.9523066 Table 8, OLS Predictors and Coefficients The spatial lagged model was conducted using the Rook Contiguity weight matrix. As table 2 indicates, the pseudo R squared has increased from the OLS result. Conversely, the R squared is not a true test of spatial regression robustness (Anselin, 2005). The log likelihood, which is a better way to judge the robustness of a SAR model, has increased from the OLS model. The log likelihood increased from -751.629 (OLS) to -677.222, Akaike criterion decreased from 1509.26 to 1394.44, and the Schwarz criterion decreased from 1519 to 1459. As a result, the improvement over the OLS spatial model has substantially increased. Figure 9 indicates a map of the standard deviation of residual values from both the SAR and OLS model. The comparison map indicates that under and over prediction has been reduced by observing the decrease in extreme negative and positive values in the SAR std dev. Map. Moreover, the residuals were tested for autocorrelation because they represent the spatially filtered model error term. The Moran’s I test statistic for the residuals is: -.030, which indicates no clustering, but outliers or model misspecification may still be a problem due to existing highhigh and low-low values. This Moran’s I value is expected due to the elimination of variables outside of the model and that the autoregressive term has removed spatial autocorrelation among neighborhoods. With the autocorrelation minimized, erroneous variables can be removed and further solidify the coefficients and predicated values among the crash rate and predictor variables from the spatial regression model The explanatory variables that are depicted in table 9 reveal the importance of significant predictions of bicycle crash rates per neighborhood. As evidenced in table 9, average household size is the most positively predictor of crash rates at the neighborhood level. On the other hand, the total owner occupied housing negatively influences crash rates. In other words, the housing ownership type is associated with decreasing crash rates. We can infer from the significance of the average household size and owner occupancy sums that household density influences the increase in crash rate and that the occupancy of housing has little to do with crash rates. The positive coefficient (z-value) of average household size reveals that this variable contributes most to expected crash rate per neighborhood. On the other hand, the summation of total renter occupied housing is also significant and positively related to crash rates. This can be an indirect indicator of demographic and lifestyle choice in that renter may have more drive to cycle and thereby positively influence the crash rate per neighborhood. The next significant factor in crash rates is the total number of gasoline/stores. This factor is significant and is positively related to increases in bicycle crash rates. The increase in gasoline/stores can be an indicator of increases automobile needs and commercial activity of the neighborhood. The average number of stories of all buildings in each neighborhood also has a positive effect on crash rate, but does contribute to the observed crash rate. This independent variable is an indicate of housing type and residential density. As a result, we can deduce that as the density of commercial, residential activity increases, so does the crash rate. Surprisingly, roadway and traffic characteristics were not significant contributors to crash rate dependence. Spatial Lag Regression Output Mean dependent var : 10.1696 S.D. dependent var : 13.4818 R-squared : 0.620526 Log likelihood: -677.222 Akaike info criterion : 1394.44 Schwarz criterion: 1459.38 S.E of regression : 8.30496 Table 8, SAR results Spatial Lag Model Results R-squared : 0.620526 Log likelihood: - 677.222 Schwarz criterion: 1459.38 Akaike info criterion : 1394.44 OLS Model Results Adjusted R-squared: Log likelihood: Schwarz criterion: Akaike info criterion : Table 9, Spatial Lag vs. OLS model results 0.463 -695.238 1490.17 1428.48 LAG Model Residuals OLS Residuals < -2.50 Std. Dev. < -2.50 Std. Dev. -2.50 - -1.50 Std. Dev. -2.50 - -1.50 Std. Dev. -1.50 - -0.50 Std. Dev. -1.50 - -0.50 Std. Dev. -0.50 - 0.50 Std. Dev. -0.50 - 0.50 Std. Dev. 0.50 - 1.50 Std. Dev. 0.50 - 1.50 Std. Dev. > 1.50 Std. Dev. > 1.50 Std. Dev. Figure 9, Residual comparison between SAR and OLS models Figure 10, plot of SAR Lag model residuals Variable Med HH Income Total # Gas/Stores School Sum Mean Pvt Cond. Total VMT Land-use Sum Total Bus Rts Controlled Inter Ave HH Size Prop of White Prop of Male Prop of Female Age under 5 Age 5-17 Age 17-65 Sum OO Sum RO Ave # Stories Ave Lot Area Coefficient -9.408525 Std.Error 8.094988 z-value -1.162265 2.314405 0.8168476 2.833338 0.0318974 0.3944083 0.08087406 2.725468 3.427343 0.7952132 1.60E-05 3.75E-05 0.4252056 0.0378078 0.1071675 0.3527916 0.05487931 0.6413104 0.0855737 0.2182366 0.1975819 1.104538 6.401072 1.419053 4.510804 -0.05174428 0.04130239 -1.252816 0.1010094 0.1342808 0.752225 -0.2549107 0.110433 -2.308284 0.0954602 0.3427642 0.2785011 -0.08995352 0.172637 -0.5210558 -0.02389913 0.09669299 -0.247165 -0.0059584 0.001597544 -3.729725 0.002606335 0.001081849 2.409149 0.5958942 0.2806523 2.123247 -6.18E-06 8.34E-06 -0.740882 Probability 0.2451278 0.0046066 0.935542 0.4264894 0.6706869 0.7242448 0.9318052 0.2693601 0.0000065 0.2102729 0.4519156 0.0209833 0.7806278 0.6023279 0.8047806 0.0001917 0.0159898 0.033733 0.4587649 Table 8, General Predictors of Neighborhood Crash Rates in the SAR model 6.0 Conclusion Past researchers have acknowledged the relationship between roadway, demographic, and land-use variables in crash rate research. Along those same lines, I have also incorporated roadway characteristics, population, and land-use data to assess the spatial relationship between accident densities within neighborhoods. In this spatial analysis review of bicycle accident density, the hypothesis of complete spatial randomness is null. The first and second order pattern analysis has alluded to the fact that bicycle accident points and densities are autocorrelated. Therefore, the predictive model chosen in this paper has corroborated this finding. It has been alluded to in this paper that a substantial temporal and spatial exists among accident locations and densities. Cluster analysis signified substantial autocorrelation that needed to be relieved in order to formulate a robust prediction model. Furthermore, in Geoda, the incorporation of a spatially lagged explanatory variable improved the predictive model. The increase in log likelihood, Akaike info criterion, and Schwarz criterion from the OLS model indicates the improvement over the OLS model and at the same time, indicates flaws as well. The results indicate that a spatial relationship exits between bicycle accident density per neighborhood and population density. Interestingly enough, roadway characteristics were neither correlated nor contributed to crash rate prediction. Further analysis at the network level may supersede this result and reveal, as in past studies, which roadway geometries and traffic contribute to bicycle crash rates. Model misspecification may have hindered the model results due to multicollinearity among explanatory variables. Future research should include the incorporation of non-redundant variables. The aggregation of variables opens the study to the issue of the modifiable aerial unit problem. Rather than aggregating the data per neighborhood, a prediction and autocorrelation measure using the crash points per road segment might have been a more appropriate model. In addition, crash density may not be a true indicator of a bicycle accident problem. Without knowing the exposure of bicyclists to roadways, we do not know what risk the roads actually play in accident rates per bicyclist. In other words, a common denominator is needed that is directly linked to rider exposure. Overall, this study has shown that Geoda has been a useful tool in addressing the spatial dependence of zonal data on bicycle crash density. The utility of this model lies in the fact that it replicates the trip generation step of the UTPS. In this case we utilized commonly know bicycle crash rate generators to predict density per zone, much like the trip generation step of the 4-step travel demand model. Therefore, this study has indicated a methodology that could easily be assimilated into the common 4-step travel demand model. The zonal based approach utilized neighborhoods and thus provided useful information for neighborhood planning approaches or to incorporate in indicator studies. As evidenced in this study, an improvement to the OLS model is apparent in the residual values plot and coefficients resulting from the spatial lag model of police reported bicycle accidents. Although the foundation of the model can be improved upon, accounting for spatial dependence and predicting bicycle crash density using on an autoregressive model may prove its utility in further research. Appendix 1 BLOS = 0.507 ln(Vol15/Ln) + 0.199 SPt(1+10.38HV)2 + 7.066(1/PR5)2 – 0.005 We2 + 0.760 where: Vol15 = volume of directional traffic in 15 minutes = (ADT * D * Kd) / (4 * PHF) ADT = Average Daily Traffic on the segment D = Directional Factor Kd = Peak to Daily Factor PHF = Peak Hour Factor Ln = number of directional through lanes SPt = effective speed limit = 1.1199 ln(SPp-20) + 0.8103, where SPp is the posted speed limit HV = percentage of heavy vehicles (as defined in the 1994 Highway Capacity Manual) PR5 = FHWA’s 5-point pavement surface condition rating (5=best) We = average effective width of outside through lane: We = Wv – (10’ * OSPA) where Wl = 0 We = Wv + Wl (1 – 2 * OSPA) where Wl > 0 & Wps = 0 We = Wv + Wl – 2 (10’ * OSPA) where Wl > 0, Wps > 0, and a bike lane exists. OSPA = fraction of segment with occupied on-street parking Wt = total width of outside lane (and shoulder) pavement Wl = width of paving between outside lane stripe and edge of pavement Wps = width of pavement striped for on-street parking Wv = effective width as a function of traffic volume Wv = Wt if ADT>4000 veh/day Wv = Wt (2 – (ADT/4000)) if ADT<4000 and road is undivided and unstriped. Bicycle Level of Service ranges associated with level of service (LOS) designations: BLOS Score Range 1.501.51-2.50 = B, 2.51-3.50 = C, 3.51-4.50 = D 4.51-5.50 = E >5.50= F Appendix 2 y Wy X Wy is the spatially lagged variable for weights matrix W y is an N by 1 vector of observations on the dep variable X is an N by K matrix of observations on the explanatory variables is an N by 1 vector of error terms is the spatial autoregressive parameter, is a K by 1 vector of regression coefficients. Bibliography Anselin, L. (2005). Exploring Spatial Data with Geoda : A Workbook. Spatial Analysis Laboratory (SAL). Department of Agricultural and Consumer Economics, University of Illinois, Urbana-Champaign, IL. Anselin, L. and Bera, A. (1998). Spatial Dependence in linear regression models with an introduction to spatial econometrics. In Ullah, A. and Giles, D.E., editors, Handbook of Applied Economic Statistics, pages 237-289. Marcel Dekker, New York. Beimborn, E., Kennedy, R. (2000?). “ Inside the Black Box: Making Transportation Models Work for Livable Communities”. Black, W.R. (2003). Transportation: A Geographical Analysis, 1st edn. New York: Guilford. Burt, J.E., Barber, G.M., (1996). Elementary Statistics for Geographers. Guilford Press, New York, New York. Carpenter, T.E., (2001). “Methods to Investigate Spatial and Temporal Clustering in Veterinary Epidemiology.” Preventative Veterinary Medicine, 48, 303-320. Feske, D., (1994). “Life in the Bike Lane.” American City and County., 109, 64-77. Flahaut, B., (2004). “Impact of Infrastructure and Local Environment on Road Unsafety Logistic Modeling with Spatial Autocorrelation.” Accident Analysis & Prevention, 36, 1055-1066. Fotheringham, A.S., Brunsdon, S., Charlton, M., (2004) Quantitative Geography, 3rd edn. London: Sage Publications. Garder, P., (1994). “Bicycle Accidents in Maine: An Analysis.” Transp. Res. Rec., 1438, Transportation Research Board, Washington, D.C., 34-41. Gamerman, D., and Moreira, A.R.B., (2004)., “Multivariate Spatial Regression Models,” Journal of Multivariate Analysis., 91, 262-281. Landis, B.W. (1996a). “Bicycle System Performance Measures.”. ITE., 66, 18-23. Landis, B.W., Vattikuti, R., Brannick, M., (1996b). “Real-Time Human Perceptions Toward a Bicycle Level of Service.” Transportation Research Record., 1578, 119-126. Levine, N., Kim, K.E., & Nitz, L.H., (1995). “Spatial Analysis of Honolulu Motor Vehicle Crashes: II. Zonal Generators.” Accident Analysis and Prevention., 27(5), 675-685. O’Sulliven, D., and Unwin, D.J. (2003). Geographic Information Analysis. John Wiley and Sons, Inc. New Jersey. Pawlovich, M.D., Souleyrette, R.R., Strauss, T., (1998). “A Methodology for Studying Crash Dependence on Demographic and Socioeconomic Data.” Transportation Conference Proceedings., Center for Transportation Research and Education. Rogerson, P.A. (2001). Statistical Methods for Geography, Sage Publications: London. Rodgers, G.B., (1997). “Factors Associated with the Crash Risk of Adult Bicyclists.” Journal of Safety Research, 28, 233-41. Steenberghen, T., Dufays, T., Thomas, I., Flahaut, B. (2004). “Intra-Urban Location and Clustering of Road Accidents using GIS: A Belgian Example.” International Journal of Geographical Information Science. 18(2), 169-181. Wachtel, A., Lewiston, D., (1994). “Risk Factors for Bicycle-Motor Vehicle Collisions at Intersections”, ITE., 64,(9) 30-35.