Title Advances in spatial analysis of traffic crashes: the identification of hazardous road locations Advisor(s) Loo, BPY Author(s) Yao, Shenjun.; 姚申君. Citation Issued Date URL Rights 2013 http://hdl.handle.net/10722/184257 The author retains all proprietary rights, (such as patent rights) and the right to use in future works. ADVANCES IN SPATIAL ANALYSIS OF TRAFFIC CRASHES: THE IDENTIFICATION OF HAZARDOUS ROAD LOCATIONS YAO SHENJUN PhD Thesis THE UNIVERSITY OF HONG KONG 2013 Abstract of thesis entitled Advances in Spatial Analysis of Traffic Crashes: The Identification of Hazardous Road Locations Submitted by Yao Shenjun For the degree of Doctor of Philosophy at The University of Hong Kong in April 2013 The identification of hazardous road locations is important to the improvement of road safety. However, there is still no consensus on the best method of identifying hazardous road locations. While traditional methods, such as the hot spot methodology, focus on the physical distances separating road crashes only, the hot zone methodology takes network contiguity into consideration and treats contiguous road segments as hazardous road locations. Compared with the hot spot method, hot zone methodology is a relatively new direction and there still remain a number of methodological issues in applying the method to the identification of hazardous road locations. Hence, this study aims to provide a GIS-based study on the identification of crash hot zones as hazardous road locations with both link-attribute and event-based approaches. It first explores the general procedures of the two approaches in identifying traffic crash hot zones, and then investigates the characteristics of the two approaches by conducting a range of sensitivity analysis on defining threshold value and crash intensity with both simulated and empirical data. The results suggest that it is better to use a dissolved road network instead of a raw-link-node road network. The segmentation length and the interval of reference points have great impacts on the identification of hot zones, and they are better defined as 100 meters considering the stabilities of the performance. While employing a numerical definition to identify hot zones is a simple and effort-saving approach, using the Monte Carlo method can avoid selection bias in choosing an appropriate number as the threshold value. If the two approaches are compared, it is observed that the link-attribute approach is more likely to cause false negative problem and the event-based approach is prone to false positive problem around road junctions. No matter which method is used, the link-attribute approach requires less computer time in identifying crash hot zones. When a range of environmental variables have to be taken into consideration, the link-attribute approach is superior to the event-based approach in that it is easier for the link-attribute approach to incorporate environmental variables with statistical models. By investigating the hot zone methodology, this research is expected to enrich the theoretical knowledge of the identification of hazardous road locations and to practically provide policy-makers with more information on identifying road hazards. Further research efforts have to be dedicated to the ranking of hot zones and the investigation of false positive and false negative problems. (399 words) Advances in Spatial Analysis of Traffic Crashes: The Identification of Hazardous Road Locations by YAO Shenjun (姚申君) A thesis submitted in partial fulfillment of the requirements for the Degree of Doctor of Philosophy at the University of Hong Kong April, 2013 Declaration I declare that this thesis represents my own work, except where due acknowledgement is made, and that it has not been previously included in a thesis, dissertation or report submitted to this University or to any other institution for a degree, diploma or other qualifications. Signed …………………………………………………………………………………………………………………………………… YAO Shenjun i Acknowledgements I owe my sincere gratitude to my supervisor, Professor Becky P.Y. Loo for her inspiring suggestions on research arenas and continuous support on my personal life. She is a role model for me, not only as an outstanding teacher or an innovative scholar, but also as a caring mother who has three lovely children. I would like to say thank you to my friends, Mr. Aijun Yang, Ms. Cheng Wang, Ms. Daisy Ho, Ms. Fangxin Yi, Ms. Huijuan Deng, Ms. Lihua Peng, Mr. James H. Lenzer, Ms. Qian Liu, Mr. Qing Pei, Ms. Shuwen Liu, Mr. Tao Liu, Ms. Xu Xu, Ms. Yujing Xie and Mr. Yu Wang. Without their smile and support, the past four years would have been more difficult. Special thanks go to my dear fellows, Ms. Alice Chow, Ms. Linna Li, Ms. Winnie Lam and Mr. Yuhao Wu. Great gratitude goes to the staff of the Geography Department for their support and patience. I would also like to say thank you to Dr. P. C. Lai, Professor S. C. Wong and Professor Guohua Li for their review of this thesis. I especially thank my parents and my husband for their full support and love throughout the years. Shenjun Yao April, 2013 iii Publications Loo, B. P. Y. & Yao, S. (2010). The impact of area deprivation on traffic casualties in Hong Kong. Paper presented in the 15th HKSTS International Conference. Loo, B. P. Y., Yao, S., Wu, J., Yu, B., & Zhong, H. (2011). Identification method of road hot zone based on GIS. Journal of Traffic and Transportation Engineering (in Chinese), 11(4), 97-103. Loo, B. P. Y., Yao, S., & Wu, J. (2011). Spatial point analysis of road crashes in Shanghai: a GIS-based network kernel density method. Paper presented in the International Conference on GeoInformatics, 2011. Loo, B. P. Y., & Yao, S. (2012). Geographic information systems. In G. Li & S. Baker (Eds.), Injury Research: Theories, Methods, and Approaches (pp. 447-463). New York: Springer. Yao, S., & Loo, B. P. Y. (2012). Identification of hazardous road locations for pedestrians. Paper presented at the 2012 International Symposium on Safety Science and Technology. v Table of Contents DECLARATION.................................................................................................................. I ACKNOWLEDGEMENTS.................................................................................................. III PUBLICATIONS ............................................................................................................... V TABLE OF CONTENTS ................................................................................................... VII LIST OF FIGURES ............................................................................................................XI LIST OF TABLES ............................................................................................................XIII LIST OF ABBREVIATIONS ............................................................................................ XVII CHAPTER 1 INTRODUCTION............................................................................................ 1 1.1 1.1.1 1.1.2 1.1.3 1.2 1.3 1.4 1.5 1.6 Research Background ............................................................................. 4 Identification of Hazardous Road Locations ........................................ 4 Detection of Spatial Concentration of Road Crashes ........................... 8 Road Safety in Hong Kong ................................................................ 12 Aim and Objectives .............................................................................. 16 Research Questions .............................................................................. 17 Significance of Study ............................................................................ 19 Definition of Terms ............................................................................... 21 Organization of Thesis.......................................................................... 22 CHAPTER 2 LITERATURE REVIEW .................................................................................. 25 2.1 2.1.1 2.1.2 2.2 2.3 2.3.1 2.3.2 2.3.3 2.3.4 2.4 Link-Attribute Methods in Identifying Hazardous Road Locations.......... 25 Hot Spot Methodology..................................................................... 26 Hot Zone Methodology .................................................................... 29 Event-Based Approaches for the Identification of Road Crash Clusters .. 32 Environmental Factors Contributing to Road Crashes............................ 35 Road Environment ........................................................................... 35 Spatial Environment ......................................................................... 36 Demographic and Socio-economic Environment .............................. 37 Natural Environment........................................................................ 38 Summary ............................................................................................. 38 CHAPTER 3 METHODOLOGY ......................................................................................... 40 3.1 3.2 3.2.1 3.2.2 3.3 3.3.1 3.3.2 3.3.3 3.4 3.4.1 3.4.2 3.5 Methodological Framework ................................................................. 40 Network-constrained Methods for Hot Zone Identification ................... 45 Link-attribute Method...................................................................... 45 Event-based Method........................................................................ 59 Data Analysis with Simulated Data....................................................... 70 Data Description .............................................................................. 71 Data Analysis ................................................................................... 74 Results ............................................................................................. 76 Data Analysis with Empirical Data ........................................................ 82 Data Collection and Editing .............................................................. 82 Data Analysis ................................................................................... 93 Summary ............................................................................................. 95 vii CHAPTER 4 LINK-ATTRIBUTE ANALYSIS FOR ROAD CRASHES ........................................ 97 4.1 4.1.1 4.1.2 4.2 4.2.1 4.2.2 4.2.3 4.3 Territory-Wide Identification of Hazardous Road Locations ................... 97 Data Description .............................................................................. 97 Data Analysis ................................................................................... 98 Results ............................................................................................... 108 Numerical Definition ...................................................................... 108 Monte Carlo Simulation ................................................................. 116 Incorporation of Road Environmental Variables.............................. 118 Summary ........................................................................................... 125 CHAPTER 5 LINK-ATTRIBUTE ANALYSIS FOR ROAD CASUALTIES .................................. 127 5.1 5.1.1 5.1.2 5.2 5.2.1 5.2.2 5.3 Importance of Analysis based on Casualties ....................................... 127 Importance of Casualty-Weighted Analysis .................................... 127 Targeted Casualties: Pedestrians .................................................... 129 Casualty-Weighted Analysis ............................................................... 134 Unweighted ................................................................................... 134 Cost-Weighted ............................................................................... 135 District-Wide Identification of Hazardous Road Locations for Pedestrians 136 5.3.1 Data Description ............................................................................ 137 5.3.2 Data Analysis ................................................................................. 143 5.3.3 Empirical Bayes .............................................................................. 152 5.4 Results ............................................................................................... 155 5.4.1 Simple Ranking .............................................................................. 155 5.4.2 Incorporation of the Surrounding Environment of Crashes Involving Pedestrians..................................................................................................... 160 5.4.3 Empirical Bayes .............................................................................. 166 5.5 Summary ........................................................................................... 170 CHAPTER 6 EVENT-BASED ANALYSIS FOR ROAD CRASHES .......................................... 173 6.1 6.1.1 6.1.2 6.1.3 6.2 6.2.1 6.2.2 6.2.3 6.3 Territory-Wide Identification of Hazardous Road Locations ................. 173 Data Description ............................................................................ 173 Data Analysis ................................................................................. 174 Results ........................................................................................... 178 Comparison with Link-attribute Approach .......................................... 190 Numerical Definition-An Arbitrary Number .................................... 191 Monte Carlo Simulation on Crash Frequency .................................. 195 Incorporation of Road Environmental Variables.............................. 197 Summary ........................................................................................... 201 CHAPTER 7 EVENT-BASED ANALYSIS FOR ROAD CASUALTIES...................................... 203 7.1 7.1.1 7.1.2 7.1.3 7.2 7.2.1 7.2.2 District-Wide Identification of Hazardous Road Locations for Pedestrians 203 Data Description ............................................................................ 203 Data Analysis ................................................................................. 204 Results ........................................................................................... 205 Comparison with Link-attribute Approach .......................................... 213 Simple Ranking .............................................................................. 213 Incorporation of Surrounding Environmental Variables .................. 223 viii 7.3 Summary ........................................................................................... 223 CHAPTER 8 CONCLUSION............................................................................................ 225 8.1 8.1.1 8.1.2 8.1.3 8.2 8.3 8.4 Summary of Findings.......................................................................... 225 Link-Attribute Approach ................................................................. 225 Event-Based Approach ................................................................... 227 Advantages and Drawbacks of the Two Approaches ....................... 227 Importance of the Study ..................................................................... 230 Limitations of the Study...................................................................... 231 Further Research Directions................................................................ 234 REFERENCES ............................................................................................................... 237 ix List of Figures Figure 1.1 Illustration on the primary focus of the thesis ......................................3 Figure 1.2 Illustration of hot spots and hot zones on a hypothetical road network ........................................................................................................................5 Figure 1.3 Illustration of measuring spatial proximity in planar and network 2D spaces ....................................................................................................................11 Figure 1.4 Number of road traffic deaths (2001-2010) .........................................13 Figure 1.5 Number of serious injuries (2001-2010)...............................................13 Figure 1.6 Annual average number of deaths per million population (2006-2010) ................................................................................................................15 Figure 1.7 Annual average number of deaths per million vehicle-km traveled (2006-2010) ................................................................................................................15 Figure 1.8 Number of road crashes and casualties (2001-2010) ..........................16 Figure 3.1 Schematic diagram of the methodological framework ..........................42 Figure 3.2 Illustration of methodological framework ..........................................44 Figure 3.3 Road crashes plotted onto a map(Loo & Yao, 2012)...........................46 Figure 3.4 Illustration of raw-link-node and dissolved-road systems ................48 Figure 3.5 Flow chart of the dissolving algorithm ................................................50 Figure 3.6 A hypothetical road network structure (Loo & Yao, 2012)...............52 Figure 3.7 A hypothetical structure of seven BSUs (Loo & Yao, 2012) ..............57 Figure 3.8 Link-attribute hot zones ........................................................................58 Figure 3.9 Work flow for generating reference points .........................................63 Figure 3.10 Reference points on a hypothetical road network ...........................65 Figure 3.11 Mapping of hot zones ...........................................................................70 Figure 3.12 Three hypothetical road networks .....................................................71 Figure 3.13 Different crash patterns on three hypothetical roads ......................73 Figure 3.14 ATC Network ........................................................................................86 Figure 3.15 Land Use in 2006 ..................................................................................88 Figure 3.16 Socio-economic deprivation index by TPU in 2006.........................92 Figure 4.1 Histogram for road crashes in 2002-2004 ......................................... 101 Figure 4.2 Hot zones (HZ 18+) identified with (a) raw-link-node road system only and (b) dissolved road system only in the period from 2002 to 2004 ..... 112 Figure 4.3 Part of hot zones identified with dissolved road system only........ 113 Figure 5.1 Numbers of road crashes and casualties during 2001 to 2010 ........ 128 Figure 5.2 Percentages of fatalities by road user type, 2001-2010 ................... 132 Figure 5.3 Percentages of fatal and seriously injured casualties by road user type, 2001-2010 ...................................................................................................... 133 Figure 5.4 Population Density in 2006 ................................................................ 138 Figure 5.5 Road Network in Kwun Tong District .............................................. 139 xi Figure 5.6 Hot zones identified only by (a) equal-weighing and (b) cost-weighted method ........................................................................................... 158 Figure 5.7 FM-only and BM-only cost-weighted hot zones in (a) 2002-2004 and (b) 2005-2007 .................................................................................................. 164 Figure 6.1 Illustration of locations of undesirable hot zones ............................ 181 Figure 6.2 Hot zones identified by 99% significance level for (a) crash frequency and (b) crash risk in the period from 2002 to 2004 ......................... 186 Figure 6.3 Hot zones identified by 99% significance level for (a) crash frequency and (b) crash risk in the period from 2005 to 2007 ......................... 187 Figure 6.4 E-only hot zones .................................................................................. 193 Figure 6.5 Illustration of determining an attribute value for a reference point .................................................................................................................................. 200 Figure 7.1 Hot zones of three types using 99% percentile in (a) 2002-2004 and (b) 2005-2007 .......................................................................................................... 209 Figure 7.2 Locations of unweighted pedestrian hot zones of (a) L-only and (b) E-only ...................................................................................................................... 220 Figure 7.3 Locations of cost-weighted E-only pedestrian hot zones ............... 221 Figure 7.4 Site Review using Google Street View (From the crossing of Fu Yan Street and Wut Wah Street) ................................................................................. 222 xii List of Tables Table 3.1 Statistics on length of BSUs based on the raw-link-node road network ......................................................................................................................48 Table 3.2 Statistics on length of BSUs based on the dissolved road network ....53 Table 3.3 Summary on hot zones for random and concentrated crash patterns on the three hypothetical road networks by using 100 m as the BSU length and RP interval .................................................................................................................78 Table 3.4 Summary on hot zones for random and concentrated crash patterns on the three hypothetical road networks by length .............................................80 Table 3.5 Shares of road crashes on road centerlines from 2002 to 2007 ...........84 Table 4.1 Comparison of negative binomial models by predictor and segmentation length .............................................................................................. 105 Table 4.2 Coefficients of predictors in negative binomial models ................... 106 Table 4.3 Negative binomial models for crash rate ............................................ 107 Table 4.4 Hot zones by type of road system (2002-2004 and 2005-2007)....... 109 Table 4.5 Hot zones identified by both raw-link-node and dissolved road systems ..................................................................................................................... 111 Table 4.6 Variation of hot zones between two periods ..................................... 115 Table 4.7 Hot zones identified in both periods .................................................. 115 Table 4.8 Statistics on hot zones based on Monte Carlo Simulations .............. 117 Table 4.9 Hot zones belonged to both HZ18+ and HZ99.9% .................................. 117 Table 4.10 Variation of hot zones between two periods ................................... 118 Table 4.11 Hot zones identified by both periods ............................................... 118 Table 4.12 Hot zones by predictor and segmentation length in 2002-2004 and 2005-2007................................................................................................................ 120 Table 4.13Variation of hot zones by segmentation length ............................... 121 Table 4.14 Summary of 100-meter hot zones by model .................................... 122 Table 4.15 Hot zones identified by both periods ............................................... 124 Table 4.16 Comparison of Model-L and Model-LATJ hot zones by district... 125 Table 5.1 Road Traffic Casualty Statistics in Hong Kong by Road User Type, 2001-2010................................................................................................................ 131 Table 5.2 Casualty cost in developed countries using willingness-to-pay approach .................................................................................................................. 136 Table 5.3 Statistics on length of BSU before and after dissolving performance .................................................................................................................................. 139 Table 5.4 Numbers and Percentages of fatalities, serious and slight pedestrian injuries ..................................................................................................................... 140 Table 5.5 Descriptive statistics on pedestrian casualties ................................... 141 Table 5.6 Percentiles of crash intensity for the unweighted analysis ............. 144 xiii Table 5.7 Percentiles of crash intensity for the cost-weighted analysis .......... 144 Table 5.8 Link-attribute results on base negative binomial models for pedestrian casualties with length as independent variable ............................... 146 Table 5.9 Link-attribute results on full negative binomial models for pedestrian casualties with six independent variables ........................................................... 147 Table 5.10 Link-attribute results on full negative binomial models for pedestrian casualties with four independent variables...................................... 147 Table 5.11 Link-attribute results on base negative binomial models for pedestrian casualties by injury type ..................................................................... 149 Table 5.12 Link-attribute results on full negative binomial models for pedestrian casualties by injury type ..................................................................... 151 Table 5.13 Percentiles of EB estimates for unweighted analysis ...................... 154 Table 5.14 Statistics on link-attribute pedestrian hot zones with threshold value defined by the simple ranking method ..................................................... 157 Table 5.15 Link-attribute hot zones identified only by the unweighted or cost-weighted method ........................................................................................... 157 Table 5.16 Variation of hot zones between two periods ................................... 160 Table 5.17 Characteristics of hot zones identified by incorporating surrounding environment ........................................................................................................... 161 Table 5.18 Link-attribute hot zones identified only by the unweighted or the cost-weighted method ........................................................................................... 163 Table 5.19 Link-attribute hot zones identified only by the base-mode or the full-model approach............................................................................................... 164 Table 5.20 Variation of BM and FM hot zones between two periods ............. 165 Table 5.21 Numbers and percentages of BM and FM hot zones identified in both periods ............................................................................................................ 166 Table 5.22 Statistics on link-attribute pedestrian hot zones with crash intensity defined by the EB estimate and observed counts (simple ranking) ................. 167 Table 5.23 Link-attribute hot zones identified only by the EB or the OC approach (simple ranking) .................................................................................... 168 Table 5.24 Variation of EB and OC hot zones between two periods (simple ranking) ................................................................................................................... 169 Table 5.25 Statistics on link-attribute pedestrian hot zones with threshold value defined by EB estimate and observed pedestrian casualty count (safety potential) ................................................................................................................. 170 Table 5.26 Link-attribute hot zones identified only by the EB or the OC approach (safety potential).................................................................................... 170 Table 6.1 Illustration of reference points and AADT information .................. 177 Table 6.2 Illustration of “Interval” variable ........................................................ 177 Table 6.3 Event-based hot zones with threshold values defined by an arbitrary number .................................................................................................................... 180 xiv Table 6.4 Statistics on event-based hot zones with crash intensity less than the critical value............................................................................................................ 180 Table 6.5 Statistics on hot zones based on statistical definition ....................... 183 Table 6.6 Hot zones by district and type based on statistical definition ......... 188 Table 6.7 Variation of hot zones (CF and CR) between two periods .............. 190 Table 6.8 Characteristics of link-attribute and event-based arbitrary-number hot zones (2002-2004) ........................................................................................... 192 Table 6.9 Comparison of link-attribute and event-based hot zones (HZ18+)... 193 Table 6.10 Statistics of link-attribute and event-based hot zones (HZ18+) for two periods ..................................................................................................................... 194 Table 6.11 Statistics of link-attribute and event-based hot zones for two periods by significance level ................................................................................. 196 Table 6.12 Statistics of link-attribute and event-based hot zones (incorporation of traffic exposure) ................................................................................................. 198 Table 6.13 Statistics of link-attribute and event-based hot zones for two periods ..................................................................................................................... 199 Table 7.1 Statistics on event-based pedestrian hot zones with threshold value defined by the simple ranking method ............................................................... 207 Table 7.2 Event-based hot zones identified only by unweighted or cost-weighted method (2002-2004) ..................................................................... 210 Table 7.3 Variation of hot zones between two periods ..................................... 212 Table 7.4 Numbers and percentages of hot zones identified in both periods . 212 Table 7.5 Statistics of link-attribute and event-based hot zones for two periods (unweighted and cost-weighted) ......................................................................... 215 Table 7.6 Summary on 99% L-only and E-only hot zones in 2002-2004........ 219 xv List of Abbreviations 2D: Two Dimensional AADT: Annual Average Daily Traffic ATC: Annual Traffic Census BM: Base Model BSU: Basic Spatial Unit CF: Crash Frequency CR: Crash Risk CV: Coefficient of Variation CW: Cost-weighted EB: Empirical Bayes FM: Full Model GIS: Geographic Information Science GPS: Global Positioning System HS: Hot Spot HSID: Hot Spot Identification HZ: Hot Zone HZID: Hot Zone Identification IDHRL: Identification of Hazardous Road Locations ISS: Injury Severity Scale KDE: Kernel Density Estimation KLINCS: K-Function Local Indicators of Network-Constrained Cluster NKDE: Network Kernel Density Estimation NNA: Nearest Neighbor Analysis OC: Observed Counts RP: Reference Point SDI: Socio-Economic Deprivation Index TPM: True Poisson Mean TPU: Tertiary Planning Unit TRADS: Traffic Road Accident Database UW: Unweighted VSL: Value of a Statistical Life WHO: World Health Organization WTP: Willingness to Pay YTM: Yau Tsim Mong xvii CHAPTER 1 INTRODUCTION The world’s first road traffic injury could date back to 1896 in New York and the first death was caused by a car just a few months later in London (Peden et al., 2004). Since then, the threat of road traffic crashes has been spread out at a tremendous speed and traffic injuries have become a global health problem (World Health Organization, 2004). According to World Health Organization (WHO) Global Status Report on Road Safety (2009), the number of road traffic deaths was estimated at about 1.2 million in each year and the number of people injured reached as high as 50 million. While many high-income countries have showed a stabilized or declined trend of traffic fatality rate, the number of traffic injuries in most regions of the world has continued to rise in recent decades (WHO, 2008a, 2008b, 2009). It has been estimated that the traffic injuries would become the fifth leading cause of death (WHO, 2008b, 2009). It has long been acknowledged that traffic deaths or injuries do not happen by chance. The traffic events are not “accidents” but “crashes”, which means that the risk can be understood and safety can thus be improved (WHO, 2004). Although a number of road safety measures have been carried out by many administrations in recent years, the statistics above 1 have demonstrated that traffic injuries are still a serious threat to people’s heath. Hence, more joint efforts from multiple disciplines are necessary to enhance safety on the roads. In dealing with road safety problems, one may ask questions such as “where are hazardous road locations?”, “which road locations need treatments?” or “which places are more likely to be improved?”. The answers to these questions rely much on one important research field on road safety, the identification of hazardous road locations (IDHRL). Hot spot identification (HSID) and hot zone identification (HZID) are two major types of IDHRL. HSID, which is also known as blacksite or blackspot identification, takes junctions or individual road segments as spatially independent units. The road hazards identified by HSID approaches are a set of independent junctions or road segments (hot spots). HZID, also named black zone identification, takes the network contiguity of spatial units into consideration when detecting spatial concentration. The hazardous road locations identified by HZID approaches are a set of contiguous road spatial units (hot zones). To geographers, both hot spots and hot zones are spatial clusters of road crashes. Conceptually, there are two ways to represent road crashes. One refers to the attribute-based type (“area-attribute approach” or “link-attribute approach”). The road crashes are represented as an attribute value assigned to an area (Aguero-Valverde & Jovanis, 2006; Blazquez & Celis, 2012; Chen, Lin & Loo, 2 2012; Erdogan, 2009; LaScala, Gerber & Gruenewald, 2000) or road links (Black, 1991; Loo, 2009; Moons, Brijs & Wets, 2009b; Persaud et al., 2010; Yamada & Thill, 2010). The other approach, often termed “event-based approach”, is to consider the location of individual road crashes (events) directly (Anderson, 2009; Eckley & Curtin, 2012; Plug, Xia & Caulfield, 2011; Steenberghen, Aerts & Thomas, 2010; Yamada & Thill, 2007). Both link-attribute and event-based approaches can be employed to identify hazardous road locations. The primary focus of this thesis, as shown in Figure 1.1, is on the methodological challenges and issues related to these two approaches for the identification of hot zones as hazardous road locations. Figure 1.1 Illustration on the primary focus of the thesis 3 The first section of this chapter further explains the contextual background with prior knowledge on the identification of hazardous road location and the detection of spatial concentration of road crashes. The section also presents the reason why Hong Kong is chosen as a study area. The research aim and objectives are introduced in Section 1.2. Main research questions are presented in Section 1.3 and significance of this research is introduced in Section 1.4. 1.1 Research Background 1.1.1Identification of Hazardous Road Locations Both traffic crash hot spots and hot zones can be regarded as hazardous road locations. Figure 1.2 delineates hot spots and hot zones on a hypothetical road network. Hot spots are individual road junctions, such as Hot Spot A or segments like Hot Spot B in Figure 1.2(a). A hot zone is characterized by at least two contiguous road spatial units, such as shown in Figure 1.2(b), Hot zone B with two spatial segments and A consisting of five spatial units. While both hot spot and hot zone methods have been employed by researchers to identify hazardous road locations, the research focuses of these two methods are quite different. This subsection will present a brief introduction on the methodological focuses of these two types of approaches. 4 Figure 1.2 Illustration of hot spots and hot zones on a hypothetical road network 1.1.1.1 Hot spot identification A road junction or a road segment is identified as a “hot” or “hazardous” or “dangerous” road location when its actual crash intensity is greater than or equal to the critical crash intensity. Most previous studies on hot spot identification focus more on the ways to define critical crash intensity. According to Cheng and Washington (2005) and Elvik (2006), there are three definitions commonly used for the identification of crash hot spots, that is, numerical, statistical and model-based definitions. Numerical definitions are based on the number of road crashes, or the level of crash risks, or the score of casualty severity. Although this kind of definitions lacks appeal to the scientific community, it is preferred by most road safety administrations in the 5 world (Elvik, 2007; Loo, 2009) due mainly to its practical advantages for administrations to execute and monitor. Statistical definitions rely on the deviation of the observations from a “normal” or “expected” number of comparison (similar) locations. Different crash intensity measures, notably simple crash counts, weighted crash counts and crash rates (i.e. crashes per population, road length, vehicles and traffic volume) are commonly used. A road segment having an actual crash intensity measure (statistically) significantly higher than the “expected” rate is considered dangerous. Finally, model-based definitions are based on more sophisticated crash predication models which take into account more “confounding” factors such as traffic volume. Compared with simple numerical definitions, model-based definitions have much appeal to scientific world, but they rely much on data and there is no consensus on specific variables that are most suitable to be treated as “confounding” factors. Nevertheless, researchers have identified several variables which are commonly included in the model as “prior information”. Such factors will be further reviewed in the second chapter. 1.1.1.2 Hot zone identification Compared with the hot spot method, hot zone methodology is a relatively new direction for the identification of hazardous road locations. Unlike hot spot methodology which directly uses critical value to define hazardous road 6 locations, hot zone methodology identifies road hazards by comparing the actual crash intensity with the threshold value in a spatial unit (such as a very small road segment) as well as examining the spatial relationships among these units. If there are at least two contiguous spatial units and each unit has actual crash intensity greater than the threshold value, a hot zone is identified as a hazardous road location. As this method stems from the idea of spatial autocorrelation on a network (Black, 1991, 1992; Black & Thomas, 1998; Flahaut, 2004; Flahaut et al., 2003; Loo, 2009; Peeters & Thomas, 2009), most of previous research on hot zone identification concentrates on the network autocorrelation and the representation of spatial relationships among spatial units, for instance, using different weighting matrixes such as 0-1 contiguity matrix or distance-decay matrix to quantify the spatial proximity. However, to the identification of hazardous road locations, the definition of the threshold value or the determination of spatial units also plays a very important role, but they have received much less attention than the “spatial” dimension of the approach. For instance, most studies only utilized the average of actual crash intensity or an arbitrary numerical crash count as the threshold value. The length of a spatial unit was generally defined as 1 hectometer, that is, 100 meters. Hence, there remain a number of methodological issues to be investigated for the identification of crash hot zones which will be the major research focus of this thesis. 7 1.1.2 Detection of Spatial Concentration of Road Crashes As mentioned before, both hot spots and hot zones are spatial concentrations of road crashes. In a geographical context, there are two major types of representations for road crashes, namely the attribute-based and the event-based types. The former can be further classified into two sub-types, that is, area-attribute and link-attribute approaches. For the area-attribute approaches, traffic crashes are always analyzed based on areas (polygons) such as traffic zones, census tracts, districts, provinces and regions (Erdogan, 2009; Huang, Abdel-Aty & Darwiche, 2010; Levine, Kim & Nitz, 1995; Quddus, 2008). Generally, the focus is to visualize and explain the variability of crash intensities. While area-attribute approaches are rarely used for the identification of specific road hazards due to its coarse spatial resolution, link-attribute and event-based approaches are always employed to identify hazardous road locations at a local scale. Hence, this subsection will mainly introduce link-attribute and event-based approaches for detecting spatial concentrations of road crashes. 1.1.2.1 Link-attribute method For link-attribute approaches, road crashes are represented as an attribute value, often the crash intensity such as crash count, weighted crash count or crash rate, assigned to a link or road segment. In the detection of spatial 8 concentrations of road crashes, link-attribute approach regards links or road segments with high attribute values as clusters. This method has been widely adopted by researchers to identify both crash hot spots and crash hot zones. In the identification of crash hot zones, the entire road network is segmented into smaller road segments with equal length, which are usually termed “basic spatial units (BSUs)”. A hot zone is identified when at least two contiguous BSUs, each of which has attribute value greater than the threshold value (Flahaut, 2004; Flahaut et al., 2003; Loo, 2009; Steenberghen et al., 2004). However, most previous studies on the link-attribute approaches, as mentioned in Section 1.1.1.2, focus on the investigation of spatial contiguity among BSUs. The determination of the attribute and the definition of the threshold, as well as the segmentation of road network have not been carefully investigated in the academic society. 1.1.2.2 Event-based method Event-based method identifies crash clusters by measuring the physical concentration among events (road crashes) through the examination of spatial proximity (Bailey, 2004; Fotheringham, Brubsdon & Charlton, 2000; O'Sullivan & Unwin, 2003; Okabe, Okunuki & Shiode, 2006a; Waller & Gotway, 2004). There are two ways to measure the proximity of road crashes in a two-dimensional (2D) space, namely planar 2D and network 2D measures. 9 The former treats road crashes as a planar 2D phenomenon which allows events to be located at any place, and the latter regards crashes as a network-constrained phenomenon which restricts events only to the network. In a planar 2D space, the spatial separation of events is calculated by the Euclidean distance, whereas in a network 2D space, the spatial separation is measured by network distance. Figure 1.3 illustrates the way in which spatial separation is measured in planar and network 2D spaces. In order to detect clustering tendency around Crash A, one may need to look for neighboring road crashes around it within a certain distance, h. In a planar 2D space, the search space of Crash A is a circle with radius equal to h (see Figure 1.3a). As Crash B is located within the search space, in other words, the distance between Crash A and B is less than h, Crash B can be regarded as a neighbor of Crash A. In a network 2D space, however, the search space of Crash A is like a tree, as colored in grey in Figure 1.3b. The network distance between Crash A and B is greater than h. As Crash B is located beyond the search space, Crash B cannot be regarded as a neighboring crash. This thesis treats road crashes as a network 2D phenomenon as they are primarily located on the road network which is a network 2D space. 10 Figure 1.3 Illustration of measuring spatial proximity in planar and network 2D spaces While most point pattern analytical tools such as kernel density estimation and K function were initially designed for planar 2D space, recent years have witnessed a growing awareness of transforming these methods from planar 2D space to network 2D space (Okabe, Okunuki & Shiode, 2006b; Okabe, Yomono & Kitamura, 1995; Shiode, 2011; Xie & Yan, 2008; Yamada & Thill, 2004; Yamada & Thill, 2007). These studies argued convincingly that spatial statistics of network-constrained phenomenon need to be different from spatial analysis methods designed for the planar space. While the applications of these methods to traffic crashes generally aimed to describe the global spatial pattern using “point process” tools at the initial stage (Okabe, Okunuki & Shiode, 2006b; Yamada & Thill, 2004), more research efforts have been paid to the identification of local crash hot spots in recent years. For 11 instance, Yamada and Thill (2007) applied the network-constrained local K-function method to the detection of traffic crash hot spots with statistical crash intensity as the cut-off value. While some attempts have been made to the application of the network-constrained event-based method in detecting crash hot spots, little research has been conducted on the application to the identification of traffic crash hot zones. Taking the network-constrained local K-function approach as an example, this thesis makes a novel attempt to employ the network-constrained event-based method for the identification of crash hot zones as hazardous road locations. 1.1.3 Road Safety in Hong Kong This research chooses Hong Kong as a study area, mainly because its road safety is still not satisfactory although it has been improved during the recent ten years in terms of numbers of traffic fatalities and serious injuries. In the period from 2001 to 2010, Hong Kong has witnessed a 32.4% reduction in the number of road traffic deaths (see Figure 1.4) from 173 in 2001 to 117 in 2010 and a significant 38.6% drop in the number of road traffic serious injuries (see Figure 1.5) from 3,517 in 2001 to 2,160 in 2010. 12 250 200 150 100 50 0 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2009 2010 Number of Road Traffic Deaths Figure 1.4 Number of road traffic deaths (2001-2010) 4000 3200 2400 1600 800 0 2001 2002 2003 2004 2005 2006 2007 2008 Number of Serious Injuries Figure 1.5 Number of serious injuries (2001-2010) 13 Loo et. al. (2007) reviewed the road safety strategy of Hong Kong by comparing Hong Kong with some developed countries such as Sweden who always has a favorable road safety record. Figure 1.6 and Figure 1.7 are the updates of the records of those countries. In comparison with these countries, Hong Kong has a relatively lower fatality rate. As shown in Figure 1.6, the annual average number of deaths per one million people was 22.53 during the period from 2006 to 2010, much smaller than Great Britain (41.80) and Sweden (42.23), due partly to the fact that people in Hong Kong rely much on off-road public transport, particularly metro. However, focusing on the risk in Figure 1.7, it can be observed that Hong Kong has a terrible record in the number of fatalities per million vehicle-km traveled. During the period from 2006 to 2010, the annual average number of deaths per million vehicle-km traveled reached as high as 0.014. It was about three-fold of Sweden and Great Britain. In addition, the overall number of reported road crashes and traffic injuries are stabilized at about 15,000 and 20,000 respectively (see Figure 1.8). These reflect that Hong Kong still has a long way to go for the road safety improvement, although the government departments, Road Safety Council and other stakeholders have already made some effective countermeasures to improve traffic safety in the recent decade. 14 100.00 90.00 80.00 70.00 60.00 50.00 40.00 30.00 20.00 10.00 0.00 Hong Kong 22.53 Australia California 70.05 94.51 Japan 47.92 New Zealand 90.89 Sweden GB 42.23 41.80 Figure 1.6 Annual average number of deaths per million population (2006-2010) 0.016 0.014 0.012 0.01 0.008 0.006 0.004 0.002 0 Hong New Australia California Japan Sweden GB Kong Zealand 0.0138445 0.0064664 0.0088948 0.0074545 0.0090993 0.0049713 0.0048807 Figure 1.7 Annual average number of deaths per million vehicle-km traveled (2006-2010) 15 24,000 20,000 16,000 12,000 8,000 4,000 0 2001 2002 2003 2004 2005 Number of Road Crashes 2006 2007 2008 2009 2010 Number of Road Casualties Figure 1.8 Number of road crashes and casualties (2001-2010) In addition, an empirical road structure may reflect some characteristics of road structure which might be neglected on a hypothetical road network. The structure of road network in Hong Kong is complex, with 4.4% of expressways, 13.6% of main roads, 76.9% of secondary roads and 5.1% of other lower-order roads. In this light, Hong Kong is suitable to be chosen as a case study of the identification of hazardous road locations. 1.2 Aim and Objectives This study aims at providing a GIS-based study on the identification of crash hot zones as hazardous road locations by using two network-constrained methods derived from network analysis and local spatial analysis. The main 16 objectives of the study are: 1) to explore a methodological framework for hot zones identification; 2) to investigate the characteristics of the link-attribute approach for detecting road hazards; 3) to examine the characteristics of the event-based approach for identifying hazardous locations; and 4) to compare the link-attribute approach with the event-based approach so as to get a better understanding of their advantages and disadvantages for identifying crash hot zones as hazardous road locations. 1.3 Research Questions Based on the objectives, this research centers upon the following five major questions: ● How can we use the link-attribute method to identify hazardous road locations? ●How sensitive are the results of hazardous road locations to changes in the methods and threshold values when the link-attribute approach is applied? ●How can we apply the event-based approach to the identification of 17 hazardous road locations? ●How sensitive are the results of hazardous road locations to changes in the methods and threshold values when the event-based approach is used? ●What are strengths and weaknesses of the link-attribute and event-based methods for the identification of crash hot zones as hazardous road locations? Although research efforts have been dedicated to the methodological framework of identification of hazardous road locations with the link-attribute approach (Loo, 2009), there still remain some issues such as segmentation method of road network which are worthy of closer investigation. Hence, the general steps for the link-attribute crash hot zone identification will be firstly presented. As crash intensity can be measured in different manners and threshold values can be determined by various methods, varying crash intensity and threshold values can further examine the sensitivity and robustness of the approach. As little research has focused on the event-based identification of crash hot zones, the way in which the event-based approach is used to identify crash hot zones should be proposed before any further investigation. After figuring out key procedures for the event-based identification, sensitivity analysis could then be conducted to examine the validity of the event-based approach. 18 After having knowledge about the two approaches in identifying crash hot zones, there comes the last research question emphasizing the relative strengths and weaknesses of the link-attribute and event-based approaches, which could shed lights on the identification of crash hot zones. 1.4 Significance of Study Road safety is a public health problem. The costs of traffic fatalities and injuries can be huge not only in terms of money, but also physical and mental sufferings. The crash victims have to suffer a lot and their families may also have to bear great burdens such as grief, medical and legal costs, and loss of earnings. Road crashes also add extra burden to the whole society, such as medical emergency services, hospitalization, traffic congestion, and even tax, social welfare, and insurance systems in the long term, which may negatively influence the productivity and competitiveness of a society (Tsui, 2006). These costs can significantly be reduced if crash risk can be better understood and countermeasures can be properly carried out and strictly enforced. As the identification of hazardous road locations plays a key role in addressing road safety problems, this study is worthwhile in reducing the costs brought by the traffic road crashes. Road safety research on the identification of hazardous road locations has 19 been performed for many decades and there is still no consensus of the best method in identifying hazardous road locations. As mentioned earlier, hot zone identification is a relatively new IDHR method. There still remain a number of methodological issues in applying this method to the identification of hazardous road locations. By further investigating the hot zone method, this study can enrich the theoretical knowledge of the identification of hazardous road locations and practically provide policy-makers with more information on identifying road hazards. Most of previous research on the identification of hazardous road locations was conducted by engineers and geographers. While engineering studies mainly focused on risk factors such as traffic volume and road junctions, geographers were more specialized in investigating spatial pattern of road crashes with spatial analytical tools such as K-function and Moran I indicator. Fewer researchers have concentrated on both areas in the identification of crash hot zones. As advocated by the WHO (2004), the improvement of the road safety requires more concerted efforts from multiple disciplines. In this light, this thesis is academically desirable since it enriches the multi-discipline research of road safety by integrating the knowledge of identification of hazardous road locations in road safety and spatial analysis in geography. 20 1.5 Definition of Terms Road traffic crash “Road safety is no accident!” (WHO, 2004). As the word “accident” has a meaning of “happening by chance” which may imply that the traffic events cannot be avoided, this study chooses to use the word “crash” instead of “accident”. According to Transport Department of HKSAR (2001-2010), a road traffic crash refers to an incident reported to the Police, involving personal injury occurring on roads in the Territory, in which one or more vehicles are involved. It can be a vehicle-vehicle collision, a vehicle-pedestrian collision (pedestrian crash) or a vehicle-object/other collision. In this study, “road crash”, “traffic crash” and “road traffic crash” are regarded as the same thing and used interchangeably. Casualty A casualty is a person killed or injured in a road crash. There may be more than one casualty in a traffic road crash (Transport Department, HKSAR). Fatality A fatality is a sustained injury causing death within 30 days of the road crash (Transport Department, HKSAR). 21 Serious injury Whether an injury is regarded as serious or slight differs by administration. This research adopts the definition of Transport Department of HKSAR that defines “serious injury” as an injury for which a person is detained in hospital as an 'in-patient' for more than twelve hours. Injuries causing death 30 or more days after the crash are also included in this category (Transport Department, HKSAR). Slight injury An injury is regarded as slight if it has a minor character such as a sprain, bruise or cut not judged to be severe, or slight shock requiring roadside attention and detention in hospital is less than 12 hours, or not required (Transport Department, HKSAR). 1.6 Organization of Thesis This thesis consists of eight chapters. The introductory chapter first gives a brief description on the research background. It also presents the aim and objectives, research questions and the significance of the study. Chapter Two covers the literature review. Chapter Three presents the methodological framework. Chapter Four and Five focus on the link-attribute approach in identifying hazardous road locations, and Chapter Six and Seven concentrate 22 on the event-based approach and comparisons of the link-attribute and event-based results. The last chapter rounds up the findings of the entire study. It also points out the limitations of the present study and provides insights for further research. 23 CHAPTER 2 LITERATURE REVIEW This chapter reviews two network-constrained methods in identifying hazardous road locations and environmental factors contributing to road crashes. 2.1 Link-Attribute Methods in Identifying Hazardous Road Locations In the literature, link-attribute methods in identifying road hazards can be categorized into two types. One type refers to the hot spot methodology which has been studied for a long history. Focusing on individual road segments, researchers have developed various hotspot identification (HSID) methods, ranging from simple ranking of crash counts to complex model-based definitions such as Empirical Bayes (EB) and Full Bayes (FB) models. Most administrations in the world employ the hot spot methodology to identify hazardous road locations. Compared with the hot spot family, the other category, known as the hot zone methodology, is rather new. It was first applied to the identification of road hazards in Black (1991) by examining both the crash rate and spatial relationship of road segments. During the recent two decades, it has attracted growing attention due to its flexibility in many aspects 25 (Loo, 2009; Moons, Brijs & Wets, 2009a). This section reviews the two link-attribute methods for the identification of dangerous road places. 2.1.1Hot Spot Methodology A crash hot spot can be defined as any site (road segments, intersections, interchanges, etc.) that has a higher number of road crashes. While a large number of studies have focused on the development of various hotspot identification methods (Aguero-Valverde & Jovanils, 2009; Hauer et al., 2002; Hauer, Ng & Lovell, 1988; Li, Zhu & Sui; 2007; Miaou & Lord, 2003; Persaud, Lyon & Nguyen, 1999; Song et al., 2006), recent efforts have been dedicated to the evaluation of these hot spot identification approaches (Cheng & Washington, 2008; Cheng & Washington, 2005; Montella, 2010). This sub-section briefly reviews a set of commonly applied HSID methods and quantitative evaluation criteria. 2.1.1.1 HSID methods Elvik(2007) made a distinction between three definitions of road crash hot spots: 1) numerical definitions; 2) statistical definitions; and 3) model-based definitions. Numerical definitions are preferred by many road safety administrations in the world (Elvik, 2008; Elvik et al., 2009; Geurts, 2006; Guo & Sheng, 2009). 26 For instance, Norway defines a hazardous spot as any location with a length of not more than 100 m where at least 4 injury crashes have been recorded in the last 5 years (Elvik, 2008). By taking into consideration the traffic volume, Australian criteria are based on both a crash count (3 or more similar injury crashes within 3 years) and a crash risk (at least 0.8) (Elvik, 2008). In Flanders, a site is defined as a hot spot when its score, calculated with severity of casualty (fatal, seriously or slightly injured) considered, is no less than 15 (Geurts, 2006). Statistical definitions rely on the deviation of the observed crash count from a normal number of comparison (similar) locations. For instance, “a location is identified as unsafe if the observed crash count exceeds the observed average of counts of similar locations with a certain confidence level (typically 0.90, 0.95, or 0.99 in practice)” (Cheng & Washington, 2005; Laughlin et al., 1975). Statistical definitions may come close to model-based definitions (Elvik, 2007) if the normal crash count of comparison sites is estimated by a statistical model such as Poisson and negative binomial models with a set of variables which may impact the incidence of road crashes. Model-based definitions are based on crash predication models. A typical example is the Empirical Bayes (EB) method, the essence of which is to smooth out the random fluctuation by combining the crash history of a specific 27 location with an estimate of the expected number of crashes based on the crash history of similar locations (Elvik, 2007). Since its first application by Abbess, Jarret and Wright (1981), the EB approach has been well developed and widely applied to road safety (Hauer, 1997; Hauer et al., 2002; Hauer, Ng & Lovell, 1988; Persaud, 1991; Persaud, Lyon & Nguyen, 1999). It has been widely acknowledged that the EB model can perform well in differentiating the “true positive” (high crash intensity due to real risk factors) and the “false positive” (high crash intensity due to randomness) locations. More recently, full Bayes (FB) methods have been employed for hot spot identification due to its explicit use of probability for quantifying uncertainty (Huang, Chin & Haque, 2009; Lan & Persaud, 2011; Miranda-Moreno & Fu, 2007). 2.1.1.2 HSID methods evaluation criteria The performance of HSID methods have been assessed quantitatively with various criteria. Persaud, Lyon and Nguyen (1999) measured the effectiveness of EB methods by calculating the difference between the observed and estimated crashes in a subsequent time period. By categorizing the sites into correct positives, false positives, correct negatives and false negatives, Elvik (2007, 2008) compared five HSID techniques in terms of two epidemiological criteria, namely sensitivity by calculating the percentage of correct positives, and specificity by computing the percentage of correct negatives. Higle and 28 Hecht (1989) evaluated HSID methods with the false identification statistic which relies on the percentages of false positives and false negatives. Apart from the false identification test, Cheng and Washington (2008) introduced another four evaluation tests, including the site consistency test which measures the ability of an HSID method to consistently identify a high-risk sites over repeated observational periods, the method consistency test which evaluates the performance of a HSID by calculating the number of the same hot spots identified in both periods, the total rank difference test which quantifies the reliability of a method by summing the rank differences of the hazardous road sections identified across the two periods, and the Poisson mean difference test which assesses the performance by calculating the sum of the absolute difference of true Poisson means (TPMs) associated with the falsely identified sites and critical TPMs. Montella (2010) gave an effectiveness measure by means of a synthetic index which combines the site consistency test, the method consistency test, and the total rank difference with the same weight. 2.1.2Hot Zone Methodology The hot zone methodology stems from the idea of spatial autocorrelation on a network. A well-known spatial statistic for measuring spatial autocorrelation is Moran I (Moran, 1948), which refers to the degree that the 29 value of a variable at a location covaries with values of that variable at nearby locations. While the Moran I statistic was initially designed for planar 2D space, Black (1991) first applied it to attributes on network segments. Black (1992) further developed this approach and presented the concept of network autocorrelation, which examines the extent to which a variable value on each network segment correlates with those on other network segments that connect to each segment. Black and Thomas (1998) illustrated the utility of network autocorrelation by using network Moran I statistic to assess the highway crash hotspot clustering tendency. While these pioneering studies were concerned with network autocorrelation at a global spatial context, Flahaut, et al (2003) introduced the local network autocorrelation indices which were a decomposition of the global Moran I index. The approach analyzes road networks by small segments known as basic spatial units. Based upon the crash rates and spatial proximity of BSUs, crash hotspot clusters (that is, the crash hot zones) are detected through analyzing the spatial association between BSUs. The method was applied to the detection of crash hot zones in Belgium. However, the study area of their case only contained a single highway, which could hardly portray the characteristics of the road network in the real world. This issue was later treated by Flahaut (2004) which analyzed the crash pattern of a province of Wallonia in Belgium (Walloon Brabant). A recent study by Loo (2009) targeted the entire city of Hong Kong 30 and performed the hot zone analysis by using a modified local Moran I statistic, named Loo statistic hereafter. The study also solved the double-counting problem by following a preset rule. In comparison with blacksite methodology which was used by Hong Kong government, Loo (2009) found that the hot zone methodology is superior and flexible in many aspects, especially in the identification of road hazards on expressways and in the rural areas. Based on the Loo statistic, Moons, Brijs and Wets (2009a) analyzed road crashes on highways in a province of Belgium and in an urban environment. The results indicated that incorporating the hot zone methodology in crash analysis could provide more information on the underlying hazardous road locations. Yamada and Thill (2010) identified crash hot zones by means of local Moran I statistic and the local Getis and Ord G statistic. They also analyzed the spatial pattern of crashes by taking into consideration the traffic exposure. Their research could avoid the detection of hot zones that merely reflect the traffic exposure. Link-attribute hot zone methodology is based on BSUs which are derived from the road segmentation with equal interval. Aggregating road crashes by BSU can raise one of the most important issues in geography that the scale of the spatial unit (length of a BSU) may influence the results (hot zones). However, in the literature, the researchers generally chose one constant value only to define the length of a BUS, such as 100 m (Black, 1991; Black & 31 Thomas, 1998; Flahaut, 2004; Flahaut et al., 2003; Loo, 2009; Moons, Brijs & Wets, 2009a) or 0.1 mile (Yamada & Thill, 2010). Little research has been directed towards the impact of BSU length on the hot zone identification. In addition, most previous studies were based on crash counts of road links and focused more on spatial statistical methods that are used to assess the spatial association of links. They conducted crash analysis without considering much on the choice of attributes that are attached to links. However, given the same hot zone identification index such as Loo statistic, the hot zones identified on the basis of crash counts may differ from those based on the number of casualties involved. The selection of variables of links hence also plays an important role in identifying hazardous road locations. Moreover, most of the researchers only concentrated on observed crash pattern. However, from the perspective of crash potential (observed minus expected), the factors that can explain the spatial distribution of crashes need to be closely examined. 2.2 Event-Based Approaches for the Identification of Road Crash Clusters Event-based methods that are widely employed in crash pattern analysis were initially developed based upon the assumption of a planar 2D space. These measures can be classified into distance-based methods which examine distances between events, and density-based methods that examine the crude 32 density or overall intensity of a point pattern (O'Sullivan & Unwin, 2003). Frequently used distance-based methods include nearest-neighbor distance and distance functions like the G, F, and K (O'Sullivan & Unwin, 2003). Of these, Ripley's K-function has been utilized for crash analysis in many studies (Jones, Langford & Bentham, 1996; Schneider, Ryznar & Khattak, 2004). Loo and Tsui (2005) used the nearest neighbor analysis (NNA) to explore the crash clustering tendency. The Pythagoras’s theorem is used to calculate the distance of each crash to its nearest neighbour. The alternative to distance-based methods is density-based measures such as quadrat count methods and density estimation. Among these, the kernel-density estimation (KDE) is the most widely used approach in analyzing crash patterns (Anderson, 2009; Delmelle & Thill, 2008; Erdogan et al., 2008; Pulugurtha, Krishnakumar & Nambisan, 2007). Although conventional event-based methods were originally developed for a planar 2D space, recent attempts were made to extend these methods from a planar 2D space to a network 2D space (Loo, Yao & Wu, 2011b; Okabe, Okunuki & Shiode, 2006b; Okabe, Satoh & Sugihara, 2009; Xie & Yan, 2008; Yamada & Thill, 2004; Yamada & Thill, 2007). Taking K-function as an example of distance-based approaches, Okabe and Yamada (2001) developed a K-function statistic on a network for measuring clustering tendency. Yamada and Thill (2004) further compared the network K-function and the traditional 33 planar K-function methods to illustrate the risk of false positive detection associated with the use of a statistic designed for a planar space to analyze a network-constrained phenomenon. Their results clearly demonstrated that the planar K-function analysis is problematic since it entails a significant chance of over-detecting (Yamada and Thill, 2004). In order to identify crash clusters, in a following study of Yamada and Thill (2007), they designed a method named K-function local indicators of network-constrained clusters (KLINCS) and identified highway vehicle crash hot spots in Buffalo of USA. For density-based measures, Xie and Yan (2008) presented a novel network KDE approach for estimating the density of network-constrained events and tested this method with traffic crashes on a road network. The results indicated that the network KDE is more appropriate than the standard planar KDE for traffic crash “hot spot” analysis. Previous network-constrained event-based methods for spatial analysis of road crashes focused on either the measurement of clustering tendency at a global level (Okabe & Yamada, 2001; Yamada & Thill, 2004) or the detection of crash clusters (that is, crash hot spots) (Loo, Yao & Wu, 2011; Xie & Yan, 2008; Yamada & Thill, 2007). However, little research has been concentrated upon the identification of clusters of road crash clusters (that is, crash hot zones). In addition, like link-attribute methods, environmental factors need to be taken into consideration while identifying hot zones with event-based 34 approaches. This study will attempt to bridge these gaps. 2.3 Environmental Factors Contributing to Road Crashes Contributory factors to road crashes can be classified into three categories, namely vehicular, human and environmental factors. This subsection briefly reviews previous studies which examined the relationship between road crashes and the environmental factors. Environmental factors are always regarded as exogenous forces on road traffic crashes, including road environment, spatial environment, demographic and socio-economic environment and natural environment. 2.3.1Road Environment Road environment refers to the characteristics of roadways and sidewalks. Many road safety engineers paid attention to exposure, such as vehicle exposure (Abbas, 2004; Miaou, 1994; Pei, Wong & Sze, 2012; Qin et al., 2004, 2006; Van den Bossche, 2005) and pedestrian volume (Brüde & Larsson, 1993; Gårder, 2004; Harwood, 2008; Leden, 2002; Lyon & Persaud, 2002). Some researchers were also concerned with road types or road functions such as expressway, arterial routes, collector roads or access roads (Hadayeghi, Shalaby & Persaud, 2007; Ladron de Guevara, Washington & Oh, 2004 ), while some were more interested in cross-section elements, such as the number of lanes, 35 lane width, shoulder type, presence of a median and median width (Aguero-Valverde & Jovanils, 2009; Lee & Abdel-Aty, 2005; Pande & Abdel-Aty, 2009; Pei, Wong & Sze, 2011). There were also a great number of studies examining the relationship between traffic controls (e.g. speed limit, access control, type of traffic control at intersections, etc.) and road crashes (Abdel-Aty et al., 2008; Afukaar & Damsere-Derry 2010; Elvik, 2012; Pei, Wong & Sze, 2011). Moreover, as the majority of road crashes occurred at or within a short distance from road junctions, many scholars focused on collisions around road intersections (Lee & Abdel-Aty, 2005; Ljung Aust, Fagerlind & Sagber, 2011; Wong, Sze & Li, 2007). 2.3.2Spatial Environment Spatial variations in road crashes have been pinpointed by many researchers in the literature. Examples of such analyses include developing vs. developed countries (Peden et al., 2004; Rasouli et al., 2008), national comparison (Yang & Otte, 2007), provincial comparison (Hu et al., 2008), and rural vs. urban areas (Kmet & Macarthur, 2006; Loo, Cheung & Yao, 2011). Moreover, some other indicators such as signal density, population density, junction density, road density, street network structure and size of build-up areas are also regarded as spatial variables (Lovegrove & Sayed 2006; Marshall & Garrick, 2011; Spoerri, Egger & Elm, 2011). The impacts of land use 36 variables on road crashes have long been examined (Dissanayake, Aryaija& Wedagama, 2009; Graham and Glaister, 2003; Jegede, 1988; Kim, Brunner & Yamashita, 2006; Wier et al., 2009). For instance, Jegede (1988) found a positive association between road collisions and economic interaction measured by the number of industrial establishments. Wier et al. (2009) reported that neighborhood commercial and residential-neighborhood commercial area positively affected the occurrence of vehicle-pedestrian injury collisions. The reason, as discussed by Ben-Akiva and Bowman (1995), is that land use is one of major factors in the generation or attraction of traffic. Certain types of land uses are associated with particular human activities that might increase the likelihood of road crashes (Tsui, 2006). 2.3.3Demographic and Socio-economic Environment The relationship between demographic characteristics and road crashes (or casualties) has been investigated by many scholars such as LaScala, Gerber and Gruenewald (2000), Law, Noland and Evans (2009), Pulugurtha and Sambhara (2011), and Spoerri, Egger and Elm (2011). A great number of studies have also examined the relationship between socio-economic factors and the incidence of crashes. Most of the studies focused on vehicle-pedestrian collisions other than vehicle-vehicle crashes. They pointed out that the socio-economic characteristics of the neighborhoods were significant 37 predictors of pedestrian crashes (Cottrill & Thakuriah 2010; LaScala, Gerber & Gruenewald, 2000; LaScala, Gruenewald & Johnson, 2004; McMahon et al., 1999). In general, crashes involving pedestrians are strongly associated with communities with lower level of employment, more low-income households, more old houses and less highly-educated residents. It was also found that area deprivation had a significant impact on the number of pedestrian casualties and the severity of pedestrian injuries (Graham, Glaister & Anderson, 2005; Graham & Stephens, 2008; Green, Muir & Mahe, 2011). 2.3.4Natural Environment Road collisions are more likely to occur in times of bad weather (e.g. raining, snowing and low visibility) (Brijs, Karlis & Wets 2008; Keay & Simmonds, 2005; Wang, Quddus & Ison, 2011; Yau, 2004), probably due to the increase of careless road users and vehicle disorder. This study will examine the effects of road, spatial, and demographic and socio-economic environments on the spatial distribution of road crashes and identify hazardous road locations by incorporating these factors. 2.4 Summary This chapter reviews the link-attribute and event-based literature relating to the identification of hazardous road locations. For link-attribute approaches, 38 although the focus of previous studies was on hot spot identification, some researchers have paid efforts on hot zone detection in recent years. However, there remained some methodological issues on link-attribute hot zone methods such as environmental exposure. For event-based crash pattern analyses, none of the existing studies are concerned with hot zone identification. It is hence worthy of the examination of event-based hot zone detection. Environmental variables that were studied in the literature to explain the spatial distribution of road crashes are also reviewed in this chapter. These environmental factors will be incorporated in the sensitivity analyses of both link-attribute and event-based approaches. 39 CHAPTER 3 METHODOLOGY This chapter provides a broad framework showing the methodology in the identification of crash hot zones. The methodological framework is presented in the first section. Next, the two network-constrained methods in analyzing road crashes for hot zone identification are introduced in the following section. The ways in which simulated and empirical data are analyzed will be discussed in Section 3.3 and Section 3.4 respectively. 3.1 Methodological Framework Figure 3.1 is a schematic diagram summarizing the methodological framework. With the overriding goal of improving road safety, this research intends to explore efficient and flexible approaches to the detection of hazardous road locations. The two major datasets are simulated and empirical data. The former are a set of hypothetical road networks with simulated road crashes located on them. The latter describe road crash, road network system and environmental characteristics of Hong Kong. The environmental database includes road environmental data such as road type, number of road junctions, traffic volume; socio-economic environmental data such as household income, employment rate, education level, and owner occupancy; and other 40 environmental data such as population density, junction density, road density, and land use characteristic in Hong Kong. The hot zone methodology is used as the basis for both link-attribute and event-based methods. Loo’s statistic will be employed to perform the link-attribute hot zone analysis, and a newly developed index based on Local-K function will be used for the event-based detection of hot zones. The two methods are applied to analysis of both simulated and empirical data. A series of sensitivity analysis will be conducted in order to investigate the characteristics of the two approaches in identifying crash hot zones, such as the examination of the influence of the segmentation method, type of crash patterns and definition of the threshold value on the hot zones. In particular, by incorporating different environmental factors in empirical data analysis, crash-based and casualty-based hot zones will be identified in a crash-potential manner. The former will focus on road environment such as traffic exposure, road type and number of road junctions and the latter will be more concentrated on surrounding environment such as characteristics of land use type and socio-economic status. The outputs of the link-attribute and event-based hot zones, including maps and statistics will be compared in order to get a better understanding of their strengths and weaknesses for detecting hazardous road locations. 41 Figure 3.1 Schematic diagram of the methodological framework 42 Figure 3.2 is a diagram showing the detailed implementation of the methodological framework for the identification of crash hot zones. The following section will introduce general steps of link-attribute analysis based on Loo’s statistic and event-based analysis based on a newly developed index deprived from Local K-function. Section 3.3 presents the way in which the two methods are applied to the analysis of hypothetical road network structures with simulated road crashes. The analysis of the simulated data focuses on the impacts of segmentation method (definition of reference points), crash pattern and road network structure. Section 3.4 presents the application of the two methods to the empirical data of Hong Kong. Both crashes and casualties are analyzed by the two approaches. The sensitivity analysis focuses on the segmentation method (definition of reference points), definition of threshold value and casualty-weighted method. Details of the data collection and approaches in analyzing the empirical data can be found in Section 3.4.1 and 3.4.2 respectively. 43 Figure 3.2 Illustration of methodological framework 44 3.2 Network-constrained Methods for Hot Zone Identification This section introduces general steps of link-attribute and event-based methods in analyzing road crashes for the identification of crash hot zones. 3.2.1Link-attribute Method As reviewed in Section 2.1.2, some spatial models such as Local Moran I, Local Getis and Ord G statistic, and Loo’s statistic have been used for analyzing road crashes. Compared with other models, Loo’s statistic provides a simple and quick way of quantitatively measuring the degree of clustering. Hence, this research chooses Loo’s statistic as the link-attribute approach for the identification of crash hot zones. The general steps involve geo-validation of road crashes, segmenting road network into BSUs, calculating actual crash intensity, defining threshold value, modeling spatial pattern and mapping of hot zones. The major steps were also reported in Loo (2009), Loo et al. (2011), and Loo and Yao (2012). 3.2.1.1 Geo-validation of road crashes High precision of locations of road crashes is vital to this research. As a network 2D phenomenon, crashes should be constrained to road network. However, for both technical and non-technical reasons, they are unlikely to intersect with the centerlines of the road network (Loo, 2006). Figure 3.3 45 delineates a tiny part of the crash map in Hong Kong in the year 2007 (Loo and Yao, 2012). It is obvious that most crashes recorded by the Hong Kong Police were not located on the road network. Hence, these crashes need to be snapped to the appropriate junctions or centerline of the road network. For instance, one may use geo-validation procedures introduced in Loo (2006) to move the crashes to appropriate locations. Figure 3.3 Road crashes plotted onto a map(Loo & Yao, 2012) 3.2.1.2 Segmentation of the road network Although there has been no clear indication of an optimal length for defining a hazardous BSU, the researchers have suggested using a fixed value 46 which should be long enough to allow the identification of crash clusters but short enough to reflect the variation in road environment (Loo, 2009). As mentioned in Section 2.1.2, previous studies usually preferred 100 m or 0.1 mile as the segmentation interval. They chose one constant value only, without examining the extent to which the length of the BSU influences the hot zone identification. This study will conduct sensitivity analysis of hot zones on segmentation length. Another issue is the violation of the “fixed” condition. For an empirical link-node road system, the length of each BSU cannot be fully standardized after the segmentation performance. The main reason is that the “fixed length” condition is likely to be violated near end nodes and result in some BSUs less than the predefined length. The situation is even worse in an urban area with a dense road structure. Closely-located road intersections which are topologically represented by nodes, may lead to a great number of short links and thus contribute to more units much shorter than a standard BSU. Taking Hong Kong as an example, there are altogether 6,445 links in the Annual Traffic Census (ATC) road database, with about 1,090 km in total length. The average length of the link is around 170 m. Table 3.1 shows the statistics on lengths of BSUs after segmentation with 100 m interval. There are totally 14,245 BSUs, of which 44.9% are shorter than 100 m. The shares of the tiny BSUs with length less than 25 m and 50 m reach 11.2% and 24.1% respectively. 47 Table 3.1 Statistics on length of BSUs based on the raw-link-node road network Length(m) 0<L<25 25<=L<50 50<=L<75 75<=L<100 L=100 Total Number of BSUs (%) 1,591 (11.2%) 1,839 (12.9%) 1,711 (12.0%) 1,257 (8.8%) 7,847 (55.1%) 14,245 (100%) In order to reduce the number of short BSUs, road links are firstly dissolved before segmentation. Figure 3.4 (a) describes a hypothetical raw-link-node system with 19 links and 25 nodes. Each link connects to, that is, has an end node in common with, at least one link. Figure 3.4 (b) delineates the road structure after a dissolving performance. In this dissolved-road system, there exists only 6 links with 12 nodes, which may result in a sharp decline of short BSUs. A A E E O O I I B F J Q C G K P P M R Q M D D L L N H H (a) (b) Figure 3.4 Illustration of raw-link-node and dissolved-road systems 48 N R Transforming from a raw-link-node to a dissolved road system involves a dissolving algorithm (Loo & Yao, 2012). A key issue is that each link, as shown in Figure 3.4 (a), always has more than one neighbour, but is only allowed to dissolve one of them. To cope with this problem, a link may dissolve one of the contiguous links by random selection or following a preset rule. A priority sequence is designed in this study for determining the link to be dissolved. The work flow of the algorithm is shown in Figure 3.5. For a link with two or more contiguous sections, the algorithm first dissolves the neighbor with the same road name. If there are no sections sharing the same name, the program will choose a neighboring link according to a “tangent” rule. 49 Link i N Contiguous link? Next link Y Y Only one link? N Y Link with the same road name ? N “tangent” rule Create tangent lines for each link at end nodes For each tangent line of contiguous link, calculate the angle with the tangent line of link i Obtain the link with smallest angle Dissolve Figure 3.5 Flow chart of the dissolving algorithm 50 Taking the hypothetical raw link-node road system in Figure 3.6(a) as an example, it consists of eight links, including link AB, BC, CD, DE, BG, BF, CH and DI. If one starts from link AB, link BC will be dissolved by link AB due to the same road name, although link BG and BF also connect to link AB at common end node B. Next, the GIS algorithm looks for contiguous segments at end node C. Following the same logic, link CD will be dissolved with link AC. Then, a new round of dissolving work begins at point D. As neither link DE nor link DI has the same road name with link CD, tangent line a, b and c are created at end node D for link CD, DE and DI, respectively. The angle between a and b is then compared with that between a and c. As the former is smaller than the latter, link DE will be picked out as the merged segment. Figure 3.6(b) depicts the road structure after the dissolving performance, which only consists of four links, namely link AE, FG, CH and DI. 51 (a) Raw link-node road system (b) Dissolved road system Figure 3.6 A hypothetical road network structure (Loo & Yao, 2012) 52 After performing the dissolving algorithm on ATC road network of Hong Kong, there are only 871 sections in the dissolved road database, with 1,250 m long on average. Table 3.2 shows statistics on length of BSUs based on the dissolved road network. If the road network is still segmented with 100 m interval, 11,398 BSUs are obtained. Among them, only 4.4% are less than 50 m long. By using the improved network segmentation method, the shares of BSUs with length less than 50 m and 100m are dramatically reduced by 81.7% (24.1% with raw road database and 4.4% with dissolved road network) and 49.0% (44.9% with raw road database and 22.9% with dissolved road network), respectively. The extent to which the hot zones are sensitive to a road system will be examined in the following analysis of this chapter. Table 3.2 Statistics on length of BSUs based on the dissolved road network Length(m) 0<L<25 25<=L<50 50<=L<75 75<=L<100 L=100 Total Number of BSUs (%) 230 (2.0%) 275 (2.4%) 252 (2.2%) 1,853 (16.3%) 8,788 (77.1%) 11,398 (100%) 3.2.1.3 Calculation of the actual crash intensity The crash intensity can be defined by various approaches such as crash frequency, crash risk or casualty-weighted scores. Taking crash frequency as an example, the actual crash intensity on a BSU is defined by the number of road crashes. However, as a large number of road crashes happen on junctions, double-counting problem occurs frequently. In dealing with the issue, a preset 53 rule is suggested to ensure the “1:1” relation. For instance, the spatial database records the location of each BSU by storing a set of coordinate pairs. With the assistance of GIS, one can obtain the minimum and the maximum x and y coordinates of each BSU. Based on these values, Loo (2009) allocated a road-junction crash to the BSU with a smaller maximum x coordinate. This study will follow this rule to tackle the double-counting problem. 3.2.1.4 Definition of the threshold value There has been no obvious evidence of the best way to define the threshold value of a BSU in hot zone identification. Widely used definitions in previous research include numerical and statistical definitions. This study also defines the threshold value from crash prediction models which take into account more environmental factors. Numerical definition Numerical definitions such as an arbitrary crash count are widely used in the identification of crash hot zones. Such type of definitions does not rely on crash potential but overall crash frequency. For instance, Loo (2009) chose 3, 4, and 5 crashes in one year as the threshold values for the identification of hot zones in Hong Kong so as to be comparable with blacksite methodology which is adopted by the HKSAR Government. 54 Statistical definition A BSU is identified as hot if the observed road crash count exceeds the expected number of road crashes on similar BSUs. Typical examples include Loo et al. (2011), MoonsBrijs and Wets (2009a) and Yamada and Thill (2010) who defined the threshold value by Monte Carlo Simulation of crash distribution with an arbitrary significance level such as 95% and 99%. Black and Thomas (1998), and Flahaut et al. (2003) used the mean value of crash intensity as the threshold value. Statistical definitions could be similar to model-based definitions if the expected crash count is estimated by a statistical model such as negative binomial models. Model-based definition None of previous research has been dedicated to model-based definition of threshold values. This study will apply this method to the identification of crash hot zones. The threshold value will be determined by a crash prediction model with a set of environmental indicators as independent variables. In this way, the crash hot zones are identified in a “potential crash reduction” manner. 3.2.1.5 Modeling of the spatial pattern This study nests on the Loo statistic (Loo, 2009), I(HZ), which can be defined as: 55 I ( HZ ) i zi n W z ij j j 1, j i ( 3.1) where n is the number of BSUs; i, j=1,2,…,n, W ij is a contiguity 0-1 matrix; z i is a 0-1 indicator showing whether or not the BSU is hot. Here, the “hot” means the BSU has actual crash intensity no less than the threshold value. z i can be denoted by: 1 if LCIi ti zi , 0 otherwise; (3.2) where LCIi is the crash intensity at BSU i; t i is the threshold value of BSUi ,which can be assigned any positive value. Matrices are widely used in spatial analysis for representing spatial concepts. This research concentrates on those contiguous BSUs with relatively high risks. Thus, W ij is denoted as a contiguity 0-1 matrix whose elements are only ones or zeros. For instance, following the hypothetical structure of seven BSUs in Figure 3.7, the weight matrix, W, can be calculated as equation 3.3. Six pairs, including BSU 1(AD) and 5 (BC), BSU 1(AD) and 2 (DG), BSU 2 (DG) and 3 (GJ), BSU 3 (GJ) and 4 (JK), BSU 2 (DG) and 6 (EF), and BSU 3 (GJ) and 7(IH), are considered as contiguous. Hence, twelve elements, namely element (1,5), (5,1), (1,2), (2,1), (2,3), (3,2), (3,4), (4,3), (2,6), (6,2), (3,7) and (7,3), are assigned as “1” and others are set as “0”. 56 Figure 3.7 A hypothetical structure of seven BSUs (Loo & Yao, 2012) * 1 0 W 0 1 0 0 1 * 1 0 0 1 0 0 1 * 1 0 0 1 0 0 1 * 0 0 0 1 0 0 0 * 0 0 0 1 0 0 0 * 0 0 0 1 0 0 0 * (3.3) The value of I(HZ)i can be zero or a positive integral number between 1 and n-1. A zero value may indicate that the actual crash intensity is less than the threshold value at BSU i, or that there are no hot BSUs in the vicinity even if the BSU i itself has a particularly high crash intensity. And a positive value means that the BSU i and at least one of its contiguous BSUs have their crash 57 intensity no less than their threshold values. 3.2.1.6 Mapping of hot zones The identification of hot zones only focuses on BSUs with positive I(HZ). These BSUs will be picked and plotted on the map for further analysis. Figure 3.8 is part of a map describing two hot zones. As shown in the figure, the length of hot zones is not fixed. While hot zone I is comprised of two contiguous BSUs, hot zone II consists of four. The scale of the hot zone not only depends on the concentration of road crashes on one BSU, but also relies on the clustering tendency of hot BSUs. Figure 3.8 Link-attribute hot zones 58 3.2.2 Event-based Method As mentioned in Chapter 2, no research has been conducted on the identification of hot zones with the event-based method. Nonetheless, researchers have employed some network event-based approaches to detect crash hot spots. Kernel Density Estimation is commonly used for hot spot detecting. Recent years have witnessed a growing concern of Network Kernel Density Estimation (NKDE) for identifying hot spots of network-constrained phenomenon (Loo, Yao & Wu, 2011b; Okabe & Sugihara, 2009; Xie & Yan, 2008). The density at locations i can be calculated by: f (i) 1 N dij Kern( ) Nb j 1 b (3.4) where f(i) is the density at location i; b is the bandwidth (searching distance) of the NKDE (only events within b are used to estimate f(i)); d ij is the network distance between location i and event j; N is the total number of events; Kern(.) is a kernel function weighting the ratio between d ij and b. Using the kernel function, the “distance decay effect” is considered, that is, the longer the distance between an event and location i, the less that event is weighted for calculating the overall density. Commonly used kernel functions include Triangle, Quartic (biweight), Triweight and Gaussian (Waller & Gotway, 2004). 59 Unlike NKDE which uses a kernel function to measure the “distance decay effects”, Network Local K-function method which was developed by Yamada and Thill (2007) chooses a uniform function and assigns equal weights to events within the searching distance. Traditional K-function investigates clustering tendency of events by examining the extent to which events occur within a distance of other events. However, for the identification of hazardous road locations, as claimed by Yamada and Thill (2007), one is not interested in crashes around which other crashes are concentrated, but is more concerned with those road locations where crashes are clustered. In this light, Yamada and Thill (2007) used reference points and defined the Network Local K-function value at reference point i as: n LKi fij (3.5) j 1 1 if dij h fij 0 otherwise; (3.6) where LKi is the Network Local K-function value; n denotes the number of crashes; d ij is the network distance between location i and event j; and h is the search distance (bandwidth) from location i. Although NKDE is more widely employed by researchers in the literature of identification of crash hot spots, the event-based analysis of this research 60 will be built upon a network scan statistic similar with Yamada and Thill (2007)’ s Network Locak K-function. The unweighted approach is chosen in this thesis because in the link-attribute analysis, the road crashes are treated equally when aggregated by road segments. In order to make a fair comparison, the unweighted analysis is more appropriate than assigning distance-decay weights. The hot zone methodology for the event-based analysis of road crashes involves geo-validating road crashes, generating reference points, calculating actual crash intensity, defining threshold value and modeling spatial patterns. As geo-validation performance has been introduced in Section 3.2.1, this sub section will give a brief introduction on the following four processes. 3.2.2.1 Determination of reference points As mentioned earlier, Local K-function methods detect clustering tendency around road locations rather than events. As it is neither feasible nor practical to actually investigate the clustering tendency for every possible location along the road network, researchers always choose a set of points, termed “reference points (RPs)” hereafter, and only calculate the clustering tendency around these RPs (Xie & Yan, 2008; Yamada & Thill, 2007). In Yamada and Thill (2007)’s work, the reference points were selected in a similar manner with the GAM analysis (Openshaw et al., 1987) which lays a fine grid 61 over the study region. Xie and Yan (2008) also discussed this issue. Firstly, they created a segment-based linear reference system out of the original road network, with each segment being a line segment between two neighboring road intersections. Next they divided each segment into basic linear units of a defined network length and the center points of these linear units were regarded as reference points. This research determines the reference points following Yamada and Thill (2007) and Xie and Yan (2008), but in a more “random” manner since the first reference point of a link is not the “From” node of the link, but is randomly determined along the link. Moreover, their research derived RPs based on the raw link-node database, which might result in a great number of RPs with interval less than the pre-set value. To reduce the undesirable effect, the road network is also dissolved before RP generation by using the same dissolving algorithm with the link-attribute approach. The work flow is presented in Figure 3.9. 62 Figure 3.9 Work flow for generating reference points The general steps for creating reference points are as follows: 1) Dissolve the raw road network to avoid too many short links by following the dissolving algorithm in Section 3.2.1.2; 2) Generate a random value r; 3) Select a link; 4) Randomly select a node as the “From” node; 5) Starting at r from the node, place RPs with equal interval Int . One 63 needs to carefully consider the length of Int as it greatly influences the spatial scale of hot zones. 6) Calculate the distance d between the last RP and the “To” node. If d is more than the search distance, treat the “To” node as the last RP; and 7) Repeat Step 3 to 6 until all the links are examined. Note that the value of r should be no more than half of Int. The reason will be presented in the following subsection. Figure 3.10 (a) shows a hypothetical road network after performing the dissolving algorithm. Following Step (1) to (7), RPs can be generated and located, as shown in Figure 3.10 (b), along the road network. As shown in the figure, the distance between the “starting” reference points (reference point 1, 2, 3, 4, and 5) and the “from” end node of the link (node B, D, F, J, and G) equals to r. On each link (link BA, DC, FE, JI, and GH), the interval between reference points equals to Int. 64 Figure 3.10 Reference points on a hypothetical road network 3.2.2.2 Calculation of crash intensity The crash intensity can be defined by various methods such as crash frequency, crash risk or casualty weighted approaches. Taking crash frequency as an example, crash intensity ECI is calculated by: n ECIi fij (3.7) j 1 1 if dij h fij 0 otherwise; (3.8) where n denotes the number of crashes; d ij is the network distance between RP i and event j; and h is the search distance from RP i. Built upon Yamada and 65 Thill (2007)s’ KLINCS approach, Equations 3.7 and 3.8 seem the same with Equations 3.5 and 3.6. However, the KLINCS approach was originally designed for investigating local crash clustering at various spatial scales with h no less than the interval of reference points. When a whole empirical network is considered, high h can result in a large overlap between neighboring search space. As h can be conceptualized as the search “radius” anchored at the RP i, h in Equation 3.8 is set equal to half of Int (interval of reference points). As mentioned in the previous subsection, when deriving the first reference point of a road link, a random value r is generated first and the value should be set no more than half of Int. But precisely speaking, it should be no more than h in Equation 3.8 so that every location of the road can be investigated. 3.2.2.3 Definition of threshold values Although no research has been dedicated to the definition of threshold values for the event-based hot zone identification, the way in which the cut-off value is determined for event-based hot spot detection has been discussed among point pattern analysts. The definitions can be categorized into three types, namely numerical, simple ranking, and statistical definitions such as a Monte Carlo method. Numerical definition By numerical definition, an arbitrary number is used to determine the 66 threshold value. Simple ranking A simple ranking method is to simply regard the percentile (90%, 95% or 99%) of largest values of observations as the cut-off value. For instance, Loo and Yao (2011) identified hot spots of vehicle-pedestrian and vehicle-vehicle crashes in Shanghai by using 99 percentiles of observed kernel density estimates as the cut-off value. Monte Carlo Simulation More generally, an event-based hot spot is defined by statistical methods, among which Monte Carlo method is the most widely used approach by researchers. In the literature, there is no consensus on the definition of Monte Carlo. For example, Ripley (1987) defines most probabilistic modeling as stochastic simulation, with Monte Carlo being reserved for Monte Carlo integration and Monte Carlo statistical tests. Sawilowsky et al (2003) distinguishes between a simulation, a Monte Carlo method, and a Monte Carlo simulation: a simulation is a fictitious representation of reality, a Monte Carlo method is a technique that can be used to solve a mathematical or statistical problem, and a Monte Carlo simulation uses repeated sampling to determine the properties of some phenomenon (or behavior). In this research, “Monte Carlo simulation” and “Monte Carlo method” are regarded as the same and 67 used interchangeably. Both of them refer to computational algorithms that rely on repeated random sampling to obtain numerical results. For instance, Yamada and Thill (2007) performed Monte Carlo approach to define the threshold values of crash hot spots by randomly allocating the same number of road crashes to the reference points and calculating the local K-function values for simulated crashes. The simulation was repeated 1,000 times and 95% significance level, that is, the 50 th largest simulated local K-function value, was used as the threshold value. 3.2.2.4 Modeling of spatial pattern An event-based indicator EI(HZ) is introduced to model the spatial pattern. The hot zones are defined as: EI ( HZ ) i zi m f z ( HZ ) ij j j 1, j i 1 if ECIi ti zi 0 otherwise; 1 if d ( HZ )ij Int f ( HZ )ij 0 otherwise; (3.9) (3.10) (3.11) where z i indicates whether the RP can be regarded as “hot” ( z i =1); ECIi is the crash intensity for the RP i, and t i is the threshold value at RP i. ECIi can be 68 calculated following Equations 3.7 and 3.8; d(HZ)ij is the network distance between RP i and j; Int is the interval of reference points. The value of EI(HZ) is also either positive or equal to zero. The identification of hot zones only focuses on RPs with positive EI(HZ) (namely positive RP hereafter). 3.2.2.5 Mapping of hot zones The search windows of positive RPs are generated in order to obtain hot zones. As shown in Figure 3.11 (a), reference point A, B, C, D and E have positive EI(HZ) values. The dark lines in Figure 3.11 (b) are search windows of these positive RPs. Note that search windows may overlap among neighboring positive RPs, especially when positive RPs are located around road junctions and near end nodes of a road. A hot zone is comprised of at least two neighboring search windows. In Figure 3.11 (b), one hot zone consists of search windows A, B and C, and the other is composed of two search windows. Like link-attribute hot zones, the length of event-based hot zones are not fixed, but rather depends on the clustering tendency of hot RPs. 69 Figure 3.11 Mapping of hot zones 3.3 Data Analysis with Simulated Data Although the primary goal of this research is to investigate the link-attribute and event-based approaches for identifying crash hot zones in a complex and empirical road environment, examining the results using simplified hypothetical networks might gain some basic ideas of the characteristics of the two approaches. Both link-attribute and event-based approaches are applied to the analysis of simulated crash and road network data. Sensitivity analysis will be conducted on road structure, crash pattern 70 and BSU length (RP interval). 3.3.1 Data Description Figure 3.12 shows three hypothetical road networks, which could be described as grid (with 24 road junctions), limited access (with 12 road junctions) and organic (with 6 road junctions) network structures (Hummel, 2001). Figure 3.12 Three hypothetical road networks Each hypothetical road network is of 10 km long with 250 simulated crashes. The total number of crashes is hypothetical but roughly based on the situation of Hong Kong. Then, three spatial patterns of crashes are generated. The dispersed pattern is generated by spacing the crashes at equal interval throughout the network. A random pattern is generated by a random process. A concentrated pattern is generated by following the concept of hot zones. Crashes are firstly grouped into 10 crash hot zones over two BSUs. Then, these underlying clusters (each with 25 crashes) are placed on the network with a 71 random start. Due to random variability problems, it is not appropriate to compare the two approaches based on only one random or concentrated pattern. In order to overcome part of the problem, random and concentrated spatial patterns are generated ten times each for the three different hypothetical road patterns. Figure 3.13 shows the dispersed pattern but just one of the ten random and concentrated spatial patterns on the three hypothetical road networks. 72 Figure 3.13 Different crash patterns on three hypothetical roads 73 3.3.2 Data Analysis The Monte Carlo simulation approach is applied to the definition of threshold values. The general steps of using Monte Carlo approach to identify hot zones by link-attribute and event-based approaches are outlined in Subsection 3.3.1 and 3.3.2 respectively. 3.3.2.1 Link-attribute analysis The three hypothetical road networks are first segmented into k BSUs with 100 meters by following the general rule. For each BSU, an integer (1, 2, 3, …,k) is assigned as unique ID. With each instance, the key steps for hot zone identification are as follows: (1) For the crash instance, calculate the number of road crashes on each BSU as actual crash intensity. (2) Randomly allocate 250 points to k BSUs and obtain the number of points on each BSU, which is denoted as z(s im) . (3) Repeat Step 2 1000 times; (4) For each BSU, the 10 th (99% significance level) largest z(sim) is used as the threshold value. (5) Identify hot zones by following Equation 3.1 and 3.2. In Step 3, the Monte Carlo simulation is repeated 1000 times. Theoretically, the Monte Carlo results are more stable if a larger number of 74 replications are conducted. However, performing more simulations can significantly increase the computational burden. Considering both precision requirements and computing time, 1000 realizations are used in the Monte Carlo procedure for a one-sided test at the 99% significance level. In addition to 100 m, the BSU length is also determined as 50 m, 150 m and 200 m in order to examine the sensitivity of hot zones to the length of BSU. 3.3.2.2 Event-based analysis For each hypothetical road network structure, k reference points are derived with 100 m interval. With each instance, the key steps for hot zone identification are as follows: (1) For the crash instance, calculate ECI (Equation 3.7 and 3.8) at each RP as actual crash intensity (2) Randomly select 250 out of k reference points as one simulated road crash pattern and calculate the crash intensity, denoted as ECI(sim), at each RP according to Equations 3.7 and 3.8; (3) Repeat Step 2 1000 times; (4) For each reference point, the 10th (99% significance level) largest ECI(sim) is used as the threshold value. 75 (5) Identify hot zones by following Equation 3.9 -3.11. In addition to 100 m, the RP interval is also determined as 50 m, 150 m and 200 m in order to examine the sensitivity of hot zones to the interval of reference points. 3.3.2.3 Comparison of two approaches The hot zones identified by the two approaches are compared. Descriptive statistics on hot zones such as total number and length of hot zones are calculated for each instance. By computing the mean and variation, sensitivity is measured. 3.3.3 Results Results for dispersed pattern show that both the link-attribute and event-based approaches were performing very well with no hot zones detected regardless of the BSU length and RP interval. Table 3.3 shows the number (length) of hot zones identified for each simulation with random and concentrated crash patterns when 100 m is used as the BSU length and interval of reference points. The differences of the two approaches are more obvious with the random spatial patterns, which should result in no hot zones especially with more repeated iterations. It can be observed from Table 3.3 that the link-attribute approach does not identify any hot zone on limited access and organic road networks in all instances, but the event-based 76 approach identifies one hot zone. The link-attribute approach detects one hot zone with grid road network whereas the event-based approach identifies three. The results suggest that the “false positive” problem can be more serious with the event-based approach, especially with grid road network which is characterized by more road intersections. Moreover, nearly all of the “hidden” hot zones are identified for the concentrated patterns. The difference between the two approaches is more significant with grid road network than that with limited access or the organic road network. On average, the link-attribute approach identifies 1.2 more hot zones than the event-based approach, but the length of hot zones detected by the latter approach is 321 m longer than that detected by the former on the grid road network. In addition, the link-attribute approach’s performance is more stable with smaller coefficients of variations (CV) for the grid road pattern in terms of both number and length of hot zones. For the limited access road network, the link-attribute and event-based approaches identify similar numbers of hot zones. For the organic road pattern, its performance is about the same as the event-base approach but the variability (as demonstrated by CV of the number and CV of the length of hot zones) is higher. 77 Table 3.3 Summary on hot zones for random and concentrated crash patterns on the three hypothetical road networks by using 100 m as the BSU length and RP interval Random pattern Grid Limited Access Organic Link-attribute Event-based Link-attribute Event-based Link-attribute Event-based Simulation 1 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 1 (200) Simulation 2 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) Simulation 3 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) Simulation 4 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) Simulation 5 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) Simulation 6 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) Simulation 7 1(200) 1 (224) 0 (0) 1 (200) 0 (0) 0 (0) Simulation 8 0 (0) 1(275) 0 (0) 0 (0) 0 (0) 0 (0) Simulation 9 0 (0) 1 (226) 0 (0) 0 (0) 0 (0) 0 (0) Simulation 10 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) Mean 0.1(20.0) 0.3(72.5) 0 (0) 0.1 (20.0) 0 (0) 0.1 (20) Standard deviation 0.3 (63.3) 0.5(117.5) 0 (0) 0.3 (63.3) 0 (0) 0.3 (63.3) Coefficient of variation 3.16 (3.16) 1.61(1.62 ) 0 (0) 3.16 (3.16) 0 (0) 3.16 (3.16) Concentrated pattern Grid Limited Access Organic Link-attribute Event-based Link-attribute Event-based Link-attribute Event-based Simulation 1 7 (1900) 5 (1372) 9 (1800) 9 (2290) 8 (1800) 10 (2060) Simulation 2 9 (2100) 8 (2522) 9 (2000) 8 (2260) 8 (1700) 9 (1920) Simulation 3 9 (1900) 6 (1945) 9 (2100) 9 (2350) 9 (2000) 9 (2020) Simulation 4 7 (1700) 8 (2412) 9 (2000) 8 (2430) 6 (1300) 9 (2130) Simulation 5 7 (2000) 7 (2545) 8 (1900) 6 (2359) 10 (2000) 9 (2080) Simulation 6 8 (2100) 6 (2242) 9 (1800) 9 (2218) 8 (1800) 9 (2040) Simulation 7 8 (1700) 7 (1999) 9 (1900) 9 (2180) 8 (1600) 10 (2000) Simulation 8 8 (1900) 6(2441) 9 (2000) 10 (2320) 10 (2100) 9 (2020) Simulation 9 9 (2000) 7 (2115) 9 (2100) 8 (2450) 9 (1900) 9 (2160) Simulation 10 7 (1500) 7 (2422) 9 (1900) 9 (2110) 10 (2000) 9 (2220) Mean 7.9(1880.0 ) 6.7(2201.5 ) 8.8(1950.0) 8.5(2296.7) 8.6(1820.0) 9.2(2065.0) Standard deviation 0.88 (193.2 ) 0.95(361.3 ) 0.4(108.1) 1.08(107.8) 1.26(239.4) 0.42(86.6) Coefficient of variation 0.11 (0.10 ) 0.14 (0.16 ) 0.05(0.06) 0.13(0.05) 0.15(0.13) 0.05(0.04) 78 If one further examines the results of sensitivity analysis on BSU length and interval of reference points, one can gain more insights on the two approaches for the identification of hazardous road locations. Table 3.4 summarizes hot zones for random and concentrated crash patterns on the three hypothetical road networks by interval. The results of sensitivity analysis provide more evidence that the event-based approach is more likely to cause “false positive” problem, especially with grid road network. For random crash pattern, when the BSU length (interval of reference points) is defended as 50 meters, the link-attribute identifies 0.20 hot zones on average with grid road network, but the event-based approach detects 0.40. If the BSU length and interval of reference point are equal to 150 m and 200 m, the link-attribute approach detects no hot zones on grid road network, but the event-based approach identifies some. With the organic road network, both link-attribute and event-based approaches identify no hot zones when 150 m and 200 m are used as the segmentation length or reference point interval. If 50 m is applied, both approaches can identify some hot zones, but the numbers and lengths are almost the same. These findings suggest that the performances of the two approaches are similar with the organic road structure. 79 Table 3.4 Summary on hot zones for random and concentrated crash patterns on the three hypothetical road networks by length Grid Random pattern Limited Access L50 E50 L50 E50 Mean 0.20(20.0) 0.40(45.0) 0.10(10.0) Standard deviation CV 0.42(42.16) 0.70(78.50) 2.11(2.11) L100 Mean Standard deviation CV Mean Standard deviation CV Mean Standard deviation CV Organic L50 E50 0.20(23.30) 0.20(20.0) 0.20(23.50) 0.32(31.62) 0.42(49.15) 0.42(42.16) 0.42(49.56) 1.75(1.74) 3.16(3.16) 2.11(2.11) 2.11(2.11) 2.11(2.11) E100 L100 E100 L100 E100 0.10 (20.00) 0.30 (72.50) 0.00 (0.00) 0.10 (20.00) 0.00 (0.00) 0.10 (20.00) 0.32 (63.25) 0.48 (117.53) 0.00 (0.00) 0.32 (63.25) 0.00 (0.00) 0.32 (63.25) 3.16 (3.16) 1.61 (1.62) 0(0) 3.16 (3.16) 0(0) 3.16 (3.16) L150 E150 L150 E150 L150 E150 0.00 (0.00) 0.80(304.8) 0.00 (0.00) 0.00 (0.00) 0.00 (0.00) 0.00 (0.00) 0.00(0.00) 0.79 (312.8) 0.00 (0.00) 0.00 (0.00) 0.00 (0.00) 0.00 (0.00) 0(0) 0.99 (1.03) 0(0) 0(0) 0(0) 0(0) L200 E200 L200 E200 L200 E200 0.00 (0.00) 0.20 (99.8) 0.00 (0.00) 0.10 (53.8) 0.00 (0.00) 0.00 (0.00) 0.00 (0.00) 0.42 (210.4) 0.00 (0.00) 0.32 (170.1) 0.00 (0.00) 0.00 (0.00) 0(0) 2.11(2.11) 0(0) 3.16 (3.16) 0(0) 0(0) Concentrated pattern Mean Standard deviation CV Mean Standard deviation CV Mean Standard deviation CV Mean Standard deviation CV Grid L50 E50 8.8(1775.0) 8.6(1908.7) Limited Access L50 E50 9.2(1765.0) 8.8(1878.2) 9.5(1885.0) E50 9.2(2020.9) 0.63(67.7) 0.52(72.76) 0.42(97.3) 0.42(88.1) 0.53(81.8) 0.42(70.63) 0.07(0.04) L100 0.06(0.04) E100 0.05(0.06) L100 0.05(0.05) E100 0.06(0.04) L100 0.05(0.03) E100 7.9(1880.0 ) 6.7(2201.5 ) 8.8(1950.0) 8.5(2296.7) 8.6(1820.0) 9.2(2065.0) 0.88 (193.2 ) 0.95(361.3 ) 0.4(108.1) 1.08(107.8) 1.26(239.4) 0.42(86.6) 0.11 (0.10 ) 0.14 (0.16 ) 0.05(0.06) 0.13(0.05) 0.15(0.13) 0.05(0.04) L150 E150 L150 E150 L150 E150 4.0(1585.0) 4.9(2771.0) 6.0(2025.0) 5.0(1916.1) 4.1(1290.0) 4.6(1489.3) 0.8 (413.0) 1.0(567.7) 1.3 (511.1) 1.1 (391.2) 2.3 (686.2) 1.3(337.3) 0.20(0.26) 0.20(0.20) 0.22 (0.25) 0.21(0.20) 0.56(0.53) 0.27 (0.23) L200 E200 L200 E200 L200 E200 2.1(1100.0) 1.6(1755.9) 1.6(720.0) 3.0(1657.5) 2.3(1000.0) 3.2(1393.6) 1.1(543.7) 0.7(832.4) 0.7(379.5) 0.9(700.4) 1.6(666.7) 1.9(808.9) 0.52(0.49) 0.44(0.47) 0.44(0.53) 0.31(0.42) 0.68 (0.67) 0.60(0.58) 80 Organic L50 When the hot zones are compared by BSU length (or RP interval), one may observe (see Table 3.4) that the hot zones identified by either the link-attribute or the event-based approach are sensitive to the BSU length or the RP interval with the random crash pattern. For instance, on average, the link-attribute approach detects 0.1 hot zones with grid road pattern when the BUS length is defined as 100 meters, but identifies no hot zones when it is defined as 150 meters or 200 meters. The average length of event-based hot zones is 45.0 m with 50-meter RP interval, but reaches 304.8 m with 150-meter interval. For the concentrated crash pattern, it is clear that the 50-meter approach is the most stable no matter which approach is used and where the simulated road crashes are located. The variability is increased with increasing segmentation length (reference point interval) on the three types of road system. Taking the link-attribute hot zones on the grid road network as an example, the CV of the number (length) of 50-meter hot zones is 0.07(0.04) whereas the figures are 0.11(0.10) with 100-meter, 0.20 (0.26) with 150-meter hot zones, and 0.52 (0.49) with 200-meter hot zones. In this sense, a relatively short segmentation length (reference point interval) might be preferable for the identification of hazardous road locations with concentrated crash patterns. 81 3.4 Data Analysis with Empirical Data While the simulations on hypothetical road network in the previous section are valuable in providing basic understanding of the characteristics of the two approaches in a network-constrained environment, it is important to explore the potential applicability of the hot-zone methodology in empirical applications, which may eventually be adopted by road safety administrations. The two approaches in analyzing empirical road crash patterns within a complicated road network are introduced in this section. The empirical data and analytical tools are presented briefly in Subsection 3.4.1 and 3.4.2 respectively. Details of the implementation and results of the analysis can be found in the following four chapters. 3.4.1Data Collection and Editing 3.4.1.1 Road network database For the network-constrained analysis, the road network database is of prime importance. This study will use the link-node system for the centerline of the road network in Hong Kong maintained and digitized by the Lands Department. The total length of the road network in the database is about 4,249 km with 32,039 links and 26,692 nodes. The database contains information on road names and road types. According to the Lands Department (2004), roads of Hong Kong can be categorized into eight classes, 82 including: 1) expressway; 2) main road; 3) secondary road; 4) elevated road, flyover and road bridge; 5) tunnel; 6) non-motorable track; 7) closed road; and 8) restricted access. 3.4.1.2 Traffic Road Accident Database (TRADS) Another principal database for this research describes the police crash investigation data known as the Traffic Road Accident Database (TRADS). The TRADS system consists of three datasets, namely crash, casualty and vehicle datasets, which provides a vast range of information on each crash. Generally, this system records the location of crashes, road user type and the injury severity classification of the casualties. Such information enables us to locate casualties, find out crashes involving pedestrians and identify fatal and seriously injured victims. As road crashes are rare events and randomness in the number of traffic crashes happening at a certain location is typical, it is of great importance that the study period can ensure representative crash samples (Moons, Brijs & Wets, 2009a). Hence, following Mueller, Rivara and Bergman (1988), this study pooled the datasets from 2002 to 2004 and 2005 to 2007 into two three-year crash databases for analysis. The crash database stores five-figure grid references which could be transformed into projection coordinates. Each crash could be plotted onto a digital map in a GIS environment based on the x and y coordinates. Table 3.5 83 shows the percentage of road crashes on centerlines in the raw crash database from 2002 to 2007 in Hong Kong. Almost all crashes (no less than 99%) did not intersect with the road links in the period from 2002 to 2004. In 2004, the research on geo-validation (Loo, 2006) was shared with the Transport Department and they improved the Traffic Information System (TIS) accordingly. Hence, since 2005, the precision has been greatly improved. Nonetheless, the spatial accuracy was still not high enough to locate all the crashes on the right place. In the period from 2005 to 2007, there were still more than 50% of road crashes beyond the centerlines. Hence, these crashes need to be snapped to the appropriate junctions or the centerlines of the road network. By following the geo-validation procedure proposed by Loo (2006), all the locations of road crashes were geo-validated before analysis. Table 3.5 Shares of road crashes on road centerlines from 2002 to 2007 2002 2003 2004 2005 2006 2007 Number of road crashes 15,576 14,436 15,026 15,062 14,849 15,315 On road centerline (%) 0.1 0.1 0.0 44.5 43.5 42.5 3.4.1.3 Annual traffic census Traffic volume is very important in this study as it is highly related with the occurrence of road crashes. The Annual average daily traffic (AADT) is derived from the Hong Kong Annual Traffic Census provided by the Traffic and Transport Survey Division of the Transport Department (2002-2007). The 84 data is generated through a range of counting stations across the Hong Kong territory, where inductive loops and pneumatic tubes are installed on a carriageway, and connected to the roadside automatic counters (Lee, 1989). The counting stations (including ) are plotted onto a street map. Since not all of the streets install the stations, only those with counting stations are included in the identification of road crashes. In recent years, Hong Kong has installed more and more stations. In the year of 2011, the census covered 1,813 km roads. However, since the study period of the analysis is from 2002 to 2007, this research only included stations that existed in all of the six years. Accordingly, the length of streets with AADT information of six years (Figure 3.14), named ATC road network hereafter, is about 1,060 km, accounting for 24.9% of the total length of the road network in the whole territory of Hong Kong (4,249 km). The average AADT from 2002 to 2004 and from 2005-2007 is used for the 2002-2004 period and the 2005-2007 period respectively. 85 Figure 3.14 ATC Network 3.4.1.4 Land utilization map Due to data availability, land utilization data in 2004 and in 2006 are used to characterize the land utilization during the period from 2002 to 2004 and the period from 2005 to 2007 respectively. The land use data of 2004 was derived from the digital topographic map (B5000) produced by the "Map Publications Centre, Hong Kong" of Survey and Mapping Office (2005). Coordinates of the map are in Hong Kong 1980 Grid. The land use data of 2006 was extracted from a paper map on land utilization of Hong Kong, which was compiled from the 2006 land use data of 86 the Planning Department and other relevant information including data derived from SPOT Satellite images (2007). The transformation from a paper map to a digitized vector map was performed by using ENVI which is the premier software for processing and analyzing geospatial imagery. The key steps involve: (1) Scan the paper map to a digital map; (2) Georeference the digital map with Hong Kong 1980 Grid coordinate system; (3) Select samples for each type of land use; (4) Perform Supervised Classification; (5) Vectorize the raster data to ArcGIS (a GIS software) shape files; and (6) Manually check and correct the classification errors. The land can be categorized by different land use categorization standard. For instance, in the paper map on land utilization of Hong Kong in 2006, the land use was classified into 19 categories. Based upon the literature on the relationship between land use types and road crashes, this research broadly divides the land use into five categories, namely residential, commercial, industrial, institutional and other types for both 2004 and 2006 land use maps (see Figure 3.15 as an example). These data will be used in casualty-based analysis. 87 Figure 3.15 Land Use in 2006 88 3.4.1.5 Population Census In Hong Kong, the Census and Statistics Department (2002, 2007) conducts a population census every ten years and a by-census in the middle of the intercensal period by tertiary planning unit (TPU). The TPU system is devised by the Hong Kong Planning Department (2001, 2006) for town planning and population census. The whole territory of Hong Kong (1,104 square kilometers) was divided into 276 TPUs in 2001 (Planning Department, 2001) and 282 in 2006 (Planning Department, 2006). The census report presented the detailed characteristics of the population in each TPU, such as age and sex of population, household size and composition, tenure of accommodation, monthly domestic household income, highest education level attended, economic activity status and occupation. In this study, the 2001 census (Census and Statistics Department, 2002) and 2006 by-census (Census and Statistics Department, 2007) reports are collected to extract a series of indicators describing the socio-economic conditions of each TPU for the 2002-2004 and 2005-2007 periods respectively. The socio-economic characteristic is represented by an index named socio-economic deprivation index (SDI), which is derived from a set of social and economic variables. In this study, the variables are chosen based upon the previous literature on area deprivation, including income, owner-occupancy, 89 education, occupation and unemployment which are defined as follows: 1) Income: Monthly household income <6000 HKD (%); 2) Owner-occupancy: Not owner-occupied households (%); 3) Education: Low upper-secondary education attainment (%); 4) Occupation: Occupation with no or low qualification (%); and 5) Unemployment: Unemployment (%). The variables are similar to those chosen by other social deprivation indices such as the Index of Local Conditions widely used in other countries (Payne, Payne & Hyde, 1996). In this study, “monthly household income <6000 HKD (%)” is a proxy for “income support recipients”, “not owner-occupied households (%)” is the same with “not owner occupancy rate”, “low upper-secondary education attainment (%)” is broadly similar to “secondary school absence rate”, “occupation with no or low qualification (%)” is a dimension similar to “Low social class”, and “unemployment (%)” is similar to “total unemployment rate” (Payne, Payne & Hyde, 1996). Depending on these five predictors, the socio-economic deprivation index, SDI, is calculated by means of Z-scores, which is given as: 90 m SDIi WjZij j 1 Zij 3.12 Vij j 3.13 j where SDIi is the socio-economic deprivation score for the ith TPU, i=1,2,3,…,n, n is the count of TPUs, m is the number of variables, V ij is the actual value of jth variable for TPU i, j is the mean value of variable j, σj is the standard deviation of jth indictor, and Wj is the weight attached to Z-scores. While some previous research used equal weights, this study gives different weightings to each indicator variable. There are strong arguments in favor of using differentiated weightings (Abdalla et al., 1997). For instance, it is not appropriate to equate the impact of unemployment as a deprivation predictor with a low upper-secondary education attainment rate. Compared with unemployment rate, the education level is not such a direct measure and its existence alone does not mean that the area is material deprived. Hence, the weights used in this research were produced by the first principal-component analysis, in a similar way as Abdalla et al. (1997). Details of the calculation of area deprivation level can also be found in (Loo & Yao, 2010). It should be pointed out that in order to ensure confidentiality of data relating to individual person, household or quarters, each TPU with less than 1,000 persons is merged with adjacent TPU(s) in census reports (Census and 91 Statistics Department, 2002, 2007). Nevertheless, the report also provides population of these merged TPU(s), which allows deriving information for each of merged TPUs by a population-weighted approach. Figure 3.16 delineates the socio-economic condition of each TPU in 2006. The deprivation level varied from 8.450 (most deprived) to -5.878 (most wealthy). These data will be used in casualty-based analysis as an indicator describing the socio-economic environment. Figure 3.16 Socio-economic deprivation index by TPU in 2006 92 3.4.2Data Analysis 3.4.2.1 Link-attribute analysis for road crashes The road crashes happening on ATC road network are analyzed at the territory level in the periods of 2002-2004 and 2005-2007. The sensitivity analysis is conducted on the segmentation method (topological structure of road network and segmentation length) and the definition of threshold values (numerical and statistical definitions). In particular, the hot zones are analyzed with threshold values defined by incorporating road environmental variables such as AADT, road type, segment length and number of road junctions. 3.4.2.2 Link-attribute analysis for road casualties Casualty-weighted analysis (unweighted and cost-weighted) on pedestrian casualties in a selected district is performed at the district level. A simple ranking method is used to determine the threshold value. In addition, surrounding environmental variables affecting the occurrence of pedestrian casualties are investigated. For each BSU, a range of TPU-based or link-based variables are extracted, such as land use, road density, junction density, population density and socio-deprivation level of the neighborhood. Hot zones for pedestrians are detected by incorporating surrounding environmental factors. Taking into consideration the regression-to-the-mean problem, the EB method is employed to define the crash intensity. Sensitivity analysis is then 93 performed on the definition of crash intensity. 3.4.2.3 Event-based analysis for road crashes The road crashes happening on ATC road network are analyzed at the territory level. Numerical and Monte Carlo methods are used for definition of threshold values. The influences of selection of reference points and road environmental factors on hot zones are examined by conducting a series of sensitivity analyses. 3.4.2.4 Event-based analysis for road casualties Casualty-weighted (unweighted and cost-weighted) analysis on pedestrian casualties in a selected district will be conducted. The threshold value is defined by a simple ranking method. 3.4.2.5 Comparison of two approaches The hot zones identified by the two approaches are compared. Descriptive statistics on hot zones of the entire region such as total number and length of hot zones or total number of hot BSUs (or RPs) involved are calculated for each map. By computing the mean and variation of these indicators, sensitivity can be measured. In addition, hot zone maps of the two approaches will be overlaid in order to derive hot zones identified by only one approach and those detected by both methods. With these analyses, the similarities and differences 94 of the two methods in identifying crash hot zones are identified. 3.5 Summary This chapter introduces the methodology for the identification of crash hot zones by using link-attribute and event-based approaches. The two approaches are first applied to the analysis of simulated road crashes on hypothetical road network. The results suggest that the link-attribute approach may detect less false positive hot zones, especially with grid road structure. On an organic road network, the performances of the two approaches are similar. No matter which type of road structure is used, decreasing segmentation length (reference point interval) may reduce the variability. In addition, this chapter describes the way in which the empirical data are collected and edited. Analytical tools are also introduced briefly. The following chapters will elaborate more on the implementation of the two approaches for empirical data analysis. 95 CHAPTER 4 LINK-ATTRIBUTE ANALYSIS FOR ROAD CRASHES This chapter performs link-attribute analysis for road crashes in the whole territory of Hong Kong. The data and analytical tools are presented in details in Section 4.1. The focuses of the analysis are on the segmentation method and definition of threshold values. The results will be discussed in Section 4.2. 4.1 Territory-Wide Identification of Hazardous Road Locations 4.1.1 Data Description The ATC road networks (1,060 km) of 2002-2004 and 2005-2007 are used for territory–wide analysis in this chapter. There were 31,324 and 30,511 road crashes happening on ATC roads during the period from 2002 to 2004 and the period from 2005 to 2007 respectively. Following previous research, the segmentation interval is first defined as 100 meters in this chapter. After segmentation, 14,245 and 11,398 BSUs were obtained for the raw-link and dissolved ATC road networks respectively. 97 4.1.2 Data Analysis 4.1.2.1 Numerical definition There has been no official numerical definition for the threshold value for hot zone identification. Loo (2009) defined the threshold values as 3, 4 and 5 road crashes in one year based upon the official definition of hot spots in Hong Kong. As this study pools three-year data together, the threshold values will be defined as 9, 12 and 15 road crashes in three years accordingly. Following Equation 3.1 and 3.2, the hot zones are identified with three threshold values. In particular, sensitivity analysis of hot zones to road system (raw link-node vs. dissolved) will be conducted. 4.1.2.2 Monte Carlo method Suppose that there are m road crashes in one period, the general steps of Monte Carlo method include: (1) For the observed set of road crashes, calculate the number of crashes on each BSU as actual crash intensity. (2) Randomly allocate m simulated crashes to BSUs and obtain the number of simulated crashes on each BSU, which is denoted as z(sim). (3) Repeat Step 2 1000 times; (4) For each BSU, the 10 th (99% significance level) largest z(sim) is used as 98 the threshold value. (5) Identify hot zones by following Equation 3.1 and 3.2. To investigate the sensitivity of hot zones to the threshold value, the significance level in Step 4 is also defined as 95% and 99.9%. 4.1.2.3 Incorporation of road environmental variables The analysis aims to identify road hazards that do not only reflect the risk factors such as traffic volume, road type and road junctions. It defines a BSU as hot when the recorded number of road crashes exceeds the normal level of safety. In this research, the “normal value” which is denoted as the threshold value “t”, is estimated by a statistical model with a set of environmental variables. The idea is borrowed from McGuigan (1981) who named the difference between observed road crash counts and expected number of road crashes as “potential crash reduction”. It should be pointed out that as this method aims to identify hazardous road locations with potential for safety improvement, the deviation of crash intensity (observed crash counts) and the threshold value of a BSU should be larger than zero if the BSU is identified as being hazardous. Hence, Z i in Equation 3.1 will be modified as: 1 if LCIi ti zi 0 otherwise; , (4.1) where LCIi is the crash intensity at BSU i; t i is the threshold value of 99 BSUi ,which is estimated by statistical models. A number of statistical models have been used to estimate the crash frequency at a specific location over a period of time. Classical models include Poisson and negative binomial regression models. Poisson regression, considered as “law of rare events”, is used to model count data and contingency tables (Cameron & Trivedi, 1998). It was employed in early attempts to investigate the relationship between road crashes and risk factors (Blower, Campbell & Green, 1993; Joshua & Garber, 1990; Saccomanno & Buyco, 1988). However, the distribution of road crashes in the real world cannot meet the requirement of the Poisson Model that the sample mean should be equal to the sample variance (equidispersion). Taking crash intensity of 2002-2004 as an example (see the histogram chart in Figure 4.1), the sample mean (0.89) was much smaller than the sample variance (22.8). To deal with the problem, negative binomial model has been widely adopted by researchers (Abdel-Aty & Radwan, 2000; Miaou, 1994; Tunaru, 1999) since it is quite useful for discrete data over an unbounded positive range whose sample variance exceeds the sample mean (overdispersion) (Cameron & Trivedi, 1998; Hilbe, 2011). Hence, the following analysis will use negative binomial regression models to calculate the mean number of road crashes as the threshold value on each BSU, which is denoted by: 100 ln( i ) ' xi 4.2 where ln(μ i) is the expected natural log of crash counts, x i is a vector of predictors and β’ are the estimated coefficients. μ i is taken as the threshold value denoted as ti in Equation 4.1. Figure 4.1 Histogram for road crashes in 2002-2004 101 In order to examine the sensitivity of hot zones to the segmentation length, the dissolved road network is also segmented with 150 m and 200 m intervals. The threshold values are statistically defined by Model L which includes length of BSU as the only independent variable, Model LA which introduces length of BSU together with AADT (annul average daily traffic from traffic census, as described in Section 3.4.1.3), Model LAT which incorporates length of BSU, AADT and type of the road as explanatory variables, and Model LATJ which is established with predictive factors including length of BSU, AADT, type of the road and the number of road junctions on BSU. Note that although using a dissolved road system can significantly reduce the number of short BSUs, there are some BSUs not having the standardized length. For instance, when the segmentation length is defined as 100 meters, the short BSUs account for around 23%. Hence, the length of BSU is still regarded as an independent variable in this research. The variable “type of road” records the road type information of BSU. As mentioned earlier, this research is based on the road centerline system devised by Lands Department who categorizes the roads of Hong Kong into eight classes. This study focuses on three major types of roads, namely expressways (EX), main roads (MA) and secondary roads (SE), and treats other five types as one category which is named “other roads” hereafter. With “other roads” as base case for the “road type” variable, EX, MA, SE are three dummy variables for 102 the road type predictor. The full log likelihood and Akaike's Information Criterion (AIC) for each model are presented in Table 4.1. The two statistics are always used to compare negative binomial models. Better fitted models have larger full log likelihood value and smaller AIC value. For both periods, the models are better fitted with increasing segmentation length. Taking Model Ls as an example, the full log likelihood value is increased from -23,728 to -15,914 in the period from 2002 to 2004 and from -23,923 to -15,941 in the period of 2005-2007; and the AIC value is decreased from 47,462 to 31,835 in the period of 2002-2004 and from 47,852 to 31,889 in the period of 2005-2007. This is probably because the shares of BSUs with zero road crashes are significantly reduced when the segmentation length is raised. The goodness of fit is also improved when more environmental factors are introduced into the models, indicating that all of these variables can partly explain the spatial distribution of road crashes. In particular, Model LATJs are much better fitted, implying that junctions may have profound impacts on the spatial distribution of road crashes. The “dispersion parameter (alpha)” and “the likelihood ratio test of alpha=0” are also presented in Table 4.1. A Poisson model is one in which the alpha value is constrained to zero. The likelihood ratio test that alpha equals to zero is a test used to compare the model to a Poisson model. The alpha value is more than zero and the p-value for the likelihood ratio test is less than 0.001 in all of the 103 models. This strongly suggests that alpha is non-zero and the negative binomial model is more appropriate than the Poisson model. If one focuses on coefficients of predictors presented in Table 4.2 for each model, all the independent variables are significantly and positively associated with the number of road crashes in both periods. More specifically, increasing AADT can increase the chance of occurrence of road crashes; compared with expressways and other roads, crashes are more likely to happen on main and secondary roads; and road junctions would encourage the incidence of road crashes. 104 Table 4.1 Comparison of negative binomial models by predictor and segmentation length Predictor: length of BSU Model 100L Model 150L Model 200L 02-04 05-07 02-04 05-07 02-04 05-07 Log likelihood AIC -23728 -23923 -18848. -18932 -15914 -15941 47462 47852 37703 37870 31835 31889 alpha 2.18 1.82 1.851 1.589 1.650 1.458 Likelihood ratio test of alpha=0 3.4e+04** 2.7e+04** 3.1e+04** 2.6e+04** 2.9e+04** 2.5e+04** Predictor: length of BSU, AADT Model 100LA Model 150 LA Model 200 LA 02-04 05-07 02-04 05-07 02-04 05-07 Log likelihood -23659 -23845 -18791 -18869 15869 -15892 AIC 47327 47699 37560 37747 31747 31792 alpha 2.14 1.78 1.813 1.553 1.618 1.427 Likelihood ratio test of alpha=0 3.3e+04** 2.6e+04** 3.0e+04** 2.5e+04** 2.8e+04** 2.4e+04** Predictor: length of BSU, AADT, road type Model 100 LAT Model 150 LAT Model 200 LAT 02-04 05-07 02-04 05-07 02-04 05-07 Log likelihood -23348 -23460 -18535 -18557 -15649 -15627 AIC 46710 46935 37084 37128 31311 31269 alpha 1.96 1.60 1.654 1.386 1.471 1.270 Likelihood ratio test of alpha=0 3.0e+04** 2.4e+04** 2.7e+04** 2.2e+04** 2.5e+04** 2.1e+04** Predictor: length of BSU, AADT, road type, number of road junctions Model 100 LATJ Model 150 LATJ Model 200 LATJ 02-04 05-07 02-04 05-07 02-04 05-07 Log likelihood -22579 -22771 -18022 -18063 -15243 -15234 AIC 45175 45558 36060 36142 30503 30484 alpha 1.56 1.30 1.36 1.14 1.22 1.05 Likelihood ratio test of alpha=0 2.4e+04** 1.9e+04** 2.2e+04** 1.7e+04** 2.0e+04** 1.7e+04** Notes: AIC: Akaike's Information Criterion ;alpha: dispersion parameter; *: p<0.05; **: p<0.001 105 Table 4.2 Coefficients of predictors in negative binomial models Predictor: length of BSU Model 100L Model 150L Model 200L 02-04 05-07 02-04 05-07 02-04 05-07 (Intercept) -.023 .030 .075 .178* .238* .347** Length .011** .010** .009** .008** .008** .007** Predictor: length of BSU, AADT Model 100LA Model 150 LA Model 200 LA 02-04 05-07 02-04 05-07 02-04 05-07 (Intercept) -.155 -.094 -.069 .050 .108 .226* Length .010** .009** .009** .008** .007** .007** AADT .006** .006** .006** .006** .006** .006** Predictor: length of BSU, AADT, road type Model 100 LAT 02-04 (Intercept) -1.843 ** Model 150 LAT 05-07 -1.861 ** 02-04 -1.703 ** Model 200 LAT 05-07 -1.652 ** 02-04 -1.550 ** 05-07 -1.531** Length .010** .009** .009** .008** .007** .007** AADT .015** .014** .014** .014** .014** .014** EXa .438** .547** .408** .486** .389** .506** MAb 1.599** 1.696** 1.564** 1.646** 1.553** 1.653** SEc 1.594** 1.686** 1.549** 1.618** 1.557** 1.648** Predictor: length of BSU, AADT, road type, number of road junctions Model 100 LATJ Model 150 LATJ Model 200 LATJ 02-04 05-07 02-04 05-07 02-04 05-07 (Intercept) -1.943** -1.943** -1.672** -1.610** -1.472** -1.450** Length .009** .008** .007** .006** .006** .005** AADT .015** .014** .014** .014** .014** .014** EXa .470** .603** .485** .562** .480** .595** MAb 1.236** 1.388** 1.213** 1.327** 1.212** 1.338** SEc 1.156** 1.312** 1.117** 1.235** 1.128** 1.264** Junction .801** .686** .579** .517** .459** .419** Notes: a:1 if expressway, 0 otherwise; b:1 if main road, 0 otherwise; c:1 if secondary road, 0 otherwise. * : p<0.05; **: p<0.001 106 Negative binomial regression is also appropriate for rate data, where the rate is a count of events occurring to a particular unit of observation, divided by some measure of that unit's exposure (Cameron & Trivedi, 1998). This is handled as an offset, where the exposure variable enters on the right-hand side of the equation, but with a parameter estimate (for log (exposure)) constrained to 1(Cameron & Trivedi, 1998). Treating the traffic volume as the exposure, this study also applied the negative binomial regression to the modeling of the crash rate by introducing log (length * AADT) as the offset variable. The results shown in Table 4.3 suggest that the road crashes are better modeled by negative binomial regression models rather than Poisson models (see alpha and likelihood ratio test of alpha=0). Although Model LA’s, as suggested by full log likelihood value and AIC value, are not as fit as Model LAs in Table 4.2, the models are also used for computing the threshold values in order to investigate the sensitivity of hot zones. Table 4.3 Negative binomial models for crash rate Model 100LA’ 02-04 05-07 Model 150 LA’ 02-04 05-07 Model 200 LA’ 02-04 05-07 (Intercept) -6.23 ** -6.26 ** -6.22 ** -6.24** -6.20 ** -6.22 ** Log likelihood -24443 -24800 -19390 -19611 -16387 -16504 AIC 48890 47953 38784 39227 32778 33013 alpha 2.62 2.28 2.25 1.98 1.99 1.82 Likelihood ratio test of alpha=0 4.8e+04** 4.1e+04** 4.5e+04** 4.0e+04** 4.3e+04** 3.9e+04** Notes: AIC: Akaike's Information Criterion ;alpha: dispersion parameter; *: p<0.05; **: p<0.001 107 The estimated numbers of road cashes which are used as threshold values for hot zone identification, are then calculated by the established negative binomial regression models, including Model Ls, LAs, LATs, LATJs and LA’s. 4.2 Results 4.2.1 Numerical Definition The results on the sensitivity analysis of link-attribute hot zone to the type of road system in the periods of 2002-2004 and 2005-2007 are summarized in Table 4.4. The table also lists the computing time to identify hot zones on one personal computer with Intel® Core™ 2 Duo CPU and 2 GB RAM, including the computing time for geo-validation of road crashes (three years), segmentation of road network, calculating crash intensity and modeling of spatial pattern. The dissolved road system also included the computing time spent performing the dissolving algorithm. It is observed that if the threshold values of 9 crashes (HZ18+), 12 crashes (HZ24+) and 15 crashes (HZ30+) in three years are used, the average numbers of raw-link-node-based hot zones identified in the two periods are approximately 100, 36, and 16, with 19.5 km, 6.7 km and 2.8 km in total length. If the dissolved-based hot zones are examined, the hot zones detected are far more than those identified based on the raw-link-node road system in both periods. For instance, the number of dissolved-based HZ18+ was 143 in the period from 2002-2004 with 6,482 road 108 crashes happening on them, which was 34 with 2,572 road crashes more than raw-link-node hot zones. The total length was even doubled that based on raw-link-node system. The gap became much wider when the threshold value is increased. Similar observations can be made in the period of 2005-2007, indicating that using dissolved road system can identify more hazardous road locations, due largely to the reduction of short BSUs that are not long enough to allow the identification of crash clusters. Table 4.4 Hot zones by type of road system (2002-2004 and 2005-2007) Raw-link-node road system 2002-2004 HZ type 2005-2007 HZ18+ HZ 24+ HZ 30+ HZ18+ HZ 24+ HZ 30+ Number of HZs 109 37 16 91 35 16 Number of BSUs 259 86 34 228 84 38 Length 20.72 6.72 2.7 18.19 6.77 3.09 No. of Road Crashes 3,910 1,638 774 3,408 1,542 813 Computing time (h.) 40 40 40 40 40 40 Dissolved road system 2002-2004 HZ type 2005-2007 HZ18+ HZ 24+ HZ 30+ HZ18+ HZ 24+ HZ30+ Number of HZs 143 73 36 144 75 38 Number of BSUs 423 191 92 417 177 88 Length 41.98 19.03 9.2 41.14 17.61 8.8 No. of Road Crashes 6,482 3,597 2,084 6,265 3,359 1,980 Computing time (h.) 45 45 45 45 45 45 To explore whether the hazardous road locations identified with the raw-link-node road system are also detected by the dissolved road network, the two types of hot zones are overlaid with different threshold values and study periods. The hot zones identified by both road systems are labeled “HZR 109 (raw-link-node-based hot zones)-cum-HZD (dissolved-based hot zones)”. Table 4.5 shows the number of HZ R-cum-HZDs as well as the percentages of HZRs and HZDs in the HZR-cum-HZD category. The shares of HZR18+-cum-HZD18+ in HZR18+ were 70.6% in the period of 2002-2004 and 81.3% in the period from 2005 to 2007. In other words, about 70% in 2002-2004 and 82% in 2005-2007 of HZR18+ were also identified as hazardous road locations when the dissolved road system is used. However, as the threshold value increases, the percentages of HZR in the HZR-cum-HZD category were decreased. In the period of 2005-2007, the shares declined from 81.3% to 62.5%. Nevertheless, the observations still suggest that a large part (more than 50%) of hot zones identified with the raw-link-node road system can also be detected by the dissolved road system. Figure 4.2 shows the spatial distribution of HZ 18+ identified only by the raw-link-node system (Figure 4.2a) and by the dissolved road system (Figure 4.2b). It can be observed that the dissolved road system can identify more hot zones in urban area (Kowloon Peninsula and Hong Kong Island). Looking closely into these hot zones (see Figure 3.7), one can find that most of the only-dissolved hot zones were located on dense roads, around road junctions. 110 Table 4.5 Hot zones identified by both raw-link-node and dissolved road systems 2002-2004 HZR18+-cum- HZD18+ HZR24+-cum- HZD24+ 70 70.6% HZR30+-cum- HZD30+ 24 39.4% 66.7% 10 32.9% 62.5% 27.8% 2005-2007 HZR18+-cum- HZD18+ HZR24+-cum- HZD24+ HZR30+-cum- HZD30+ 72 26 10 81.3% 48.2% 85.7% 40% 62.5% 26.3% Notes: 1. Percentages of HZR in the HZR-cum-HZD category are typed in italics 2. Percentages of HZD in the HZR-cum-HZD category are underlined 111 (a) (b) Figure 4.2 Hot zones (HZ18+) identified with (a) raw-link-node road system only and (b) dissolved road system only in the period from 2002 to 2004 112 Figure 4.3 Part of hot zones identified with dissolved road system only If the road sections are identified as road hazards in the period from 2002 to 2004, they are more likely to be detected as hazardous road locations in the following three years if the identification method is robust. In order to examine the stability, the hot zones of two periods are firstly compared in terms of the total number and length of the hot zones, the number of BSUs and the overall crash counts. The results summarized in Table 4.6 show that the dissolved road system is generally more stable than the raw link-node road system (see CV in the table) especially when the threshold value is assigned a relatively small value. To further examine the robustness of the two methods, the two types of hot zones are overlaid and the results are shown in Table 3.9. 113 It is observed from the hot zones identified in the period of 2002-2004 (HZ24) and in the period of 2005-2007 (HZ57) that performance of the dissolved-road-based approach is more stable than that of the raw-link-node method regardless of the threshold values. Using HZ24 18+-cum-HZ5718+ as an example, less than 40% of HZ2418+ and HZ5718+ were compatible when the raw-link-node road system was used. When the dissolved system was employed, the percentages of HZ24 30+ and HZ5730+ in the HZ24-cum-HZ57 category reached about 60%. As using the dissolved road system may detect less false negatives and false positives, the link-attribute analysis is better conducted with a dissolved road system. Hence, the following analysis will be performed based on the dissolved road network. 114 Table 4.6 Variation of hot zones between two periods Raw-link-node Road System HZ type HZ18+ HZ24+ HZ30+ Mean Standard deviation CV Mean Standard deviation CV Mean Standard deviation CV No. of HZs 100.00 12.73 0.13 36.00 1.41 0.04 16.00 0.00 0.00 No. of BSUs 243.50 21.92 0.09 85.00 1.41 0.02 36.00 2.83 0.08 Length of HZs 19.46 1.79 0.09 6.75 0.04 0.01 2.90 0.28 0.10 No. of Crashes 3659.00 354.97 0.10 1590.00 67.88 0.04 793.50 27.58 0.03 Dissolved Road System HZ type HZ18+ HZ24+ HZ30+ Mean Standard deviation CV Mean Standard deviation CV Mean Standard deviation CV No. of HZs 143.50 0.71 0.00 74.00 1.41 0.02 37.00 1.41 0.04 No. of BSUs 420.00 4.24 0.01 184.00 9.90 0.05 90.00 2.83 0.03 Length of HZs 41.56 0.59 0.01 18.32 1.00 0.05 9.00 0.28 0.03 No. of Crashes 6373.50 153.44 0.02 3478.00 163.29 0.04 2032.00 70.54 0.03 Table 4.7 Hot zones identified in both periods Raw-link-node road system HZ2418+-cum- HZ5718+ HZ2424+-cum- HZ5724+ 43 39.4% HZ2430+-cum- HZ5730+ 10 30.1% 27.2% 4 28.6% 25.5% 25.5% Dissolved road system HZ2418+-cum- HZ5718+ HZ2424+-cum- HZ5724+ 86 60.1% 36 59.7% 49.3% 48.2% HZ2430+-cum- HZ5730+ 19 52.8% Notes: 1. Percentages of HZ24 in the HZ24-cum-HZ57 category are typed in italics 2. Percentages of HZ57 in the HZ24-cum-HZ57 category are underlined 115 50% 4.2.2 Monte Carlo Simulation Table 4.8 summarizes the results on Monte Carlo Simulation. The table also lists the computing time to identify hot zones on one personal computer with Intel® Core™ 2 Duo CPU and 2 GB RAM, including the computing time for geo-validation process, generating BSUs, calculating crash intensity, performing 1,000 simulations and modeling of the spatial pattern. Compared with the arbitrary-number definition, the Monte Carlo Simulation spent much more time completing the whole procedure. The average computing time of the arbitrary-number method was roughly 45 hours (see Table 4.4), but the Monte Carlo method cost around 90 hours. When the 99.9% significance level was used, the numbers of hot zones (HZ 99.9%) were 125 in 2002-2004 and 123 in 2005-2007, which were similar with HZ 18+ (143 in 2002-2004 and 144 in 2005-2007). To further compare the two approaches, the two types of hot zones are overlaid and the hot zones identified by both methods are summarized in Table 4.9. The results suggest that the two types of hot zones were compatible in the sense that the shares of both HZ18+ and HZ99.9% hot zones in the HZ18+-cum-HZ99.9% category were quite large. 116 Table 4.8 Statistics on hot zones based on Monte Carlo Simulations 2002-2004 2005-2007 HZ95% HZ99% HZ99.9% HZ95% 247 HZ99% 184 HZ99.9% 123 No. of HZs 244 185 125 No. of BSUs 802 565 355 819 563 363 Length of HZs 79.04 55.93 35.24 80.25 55.30 35.85 No. of Crashes 10,160 8,000 5,702 10,080 7,811 5,686 90 90 90 90 90 90 Computing time (h.) Table 4.9 Hot zones belonged to both HZ18+ and HZ99.9% 2002-2004 123 86.39% 2005-2007 122 98.4% 86.49% 99.19% Notes: 1. Percentages of HZ18+ in the HZ18+-cum-HZ99.9% category are typed in italics 2. Percentages of HZ99.9% in the HZ18+-cum-HZ99.9% category are underlined To test the stability of the performances of the two approaches, the variation of the two types of hot zones between the period from 2002 to 2004 and the period from 2005 to 2007 are compared. As shown in Table 4.11, the CVs of HZ18+ are smaller than those of HZ 99.9% in terms of the numbers of hot zones and BSUs, but greater in terms of the total length and the number of road crashes happening on hot zones. If the hot zones are overlaid, the shares of HZ18+ in the HZ24-cum-HZ57 category are slightly greater than those of HZ99.9%. 117 Table 4.10 Variation of hot zones between two periods HZ type HZ18+ HZ99.9% Mean Standard deviation CV Mean Standard deviation CV No. of HZs 143.50 0.71 0.005 124.00 1.41 0.011 No. of BSUs 420.00 4.24 0.010 359.00 5.66 0.016 Length of HZs 41.56 0.59 0.014 35.55 0.43 0.012 6373.50 153.44 0.024 5694.00 11.31 0.002 No. of Crashes Table 4.11 Hot zones identified by both periods HZ18+ 86 60.1% HZ99.9% 72 59.7% 57.6% 58.5% Notes: 1. Percentages of HZ24 in the HZ24-cum-HZ57 category are typed in italics 2. Percentages of HZ57 in the HZ24-cum-HZ57 category are underlined In summary, the hot zone identified by the Monte Carlo simulation method can be compatible with those detected by the arbitrary-number definition. One may choose Monte Carlo Simulation method to identify hot zones so as to avoid the use of an arbitrary number in defining threshold values, but the arbitrary-number approach can save more efforts than the Monte Carlo method. 4.2.3 Incorporation of Road Environmental Variables Table 4.12 summarizes the hot zones by segmentation length and predictors. In general, the hazardous road locations accounted for around 20% of ATC road networks, but with more than 50% of road crashes happening on them in both periods regardless of the type of models and segmentation length, 118 indicating the great clustering tendency of the road crashes in the whole territory. If one closely compares the hot zones among different models, one may observe that the hot zones are sensitive to environmental predictors. For instance, in the period from 2005 to 2007, the number of hot zones was 462 with 2,045 BSUs and 17,594 road crashes happening on them when Model L was used to determine the threshold value and the BSU length was defined as 100 meters, whereas the number of hot zones reached 546 with 2,090 BSUs and 16,487 road crashes when Model LATJ was used to calculate the threshold value. Even with the same variables, the hot zones identified with threshold value determined by Model LA and Model LA’ were different. The Model LA detected longer hot zones with more road crashes than Model LA’ did regardless of the period and segmentation length. Moreover, the segmentation length also had great influences on the results. For example, in the period from 2002 to 2004, the length of 100-meter hot zones identified with Model L was 193.95 km and there were 17,913 road crashes happening on them. In order to choose an appropriate segmentation length for the identification of hazardous road locations in Hong Kong, the variability of hot zones is compared among different segmentation lengths. The length whereby the hot zone results are more stable will be selected to identify crash hot zones. 119 Table 4.12 Hot zones by predictor and segmentation length in 2002-2004 and 2005-2007 Predictors: BSU length (Model L) 2002-2004 2005-2007 HZ100 HZ150 HZ200 HZ100 HZ150 HZ200 508 343 245 462 347 239 Length (km) of HZs 193.95 248.28 244.12 196.74 252.51 250.99 No. of BSUs 2,010 1,739 1,306 2,045 1,771 1,314 No. of Road Crashes 17,913 19,726 19,091 17,594 19,424 18,681 No. of HZs Predictors: BSU length, AADT (Model LA) 2002-2004 2005-2007 HZ100 HZ150 HZ200 HZ100 HZ150 HZ200 483 323 252 478 334 245 Length (km) of HZs 179.05 220.30 242.76 196.43 233.34 259.30 No. of BSUs 1,859 1,552 1,300 2,032 1,641 1,384 No. of Road Crashes 17,027 18,229 18,690 17,411 18,370 18,790 No. of HZs Predictors: BSU length, AADT, Road type (Model LAT) 2002-2004 No. of HZs Length (km) of HZs No. of BSUs No. of Road Crashes 2005-2007 HZ100 HZ150 HZ200 HZ100 HZ150 HZ200 515 336 268 494 344 249 182.99 210.78 236.52 193.58 219.55 241.21 1900 1485 1260 2007 1554 1291 16,582 17,153 17,987 16,643 17,134 17,528 Predictors: BSU length, AADT, Road type, Number of road junctions (Model LATJ) 2002-2004 No. of HZs Length (km) of HZs No. of BSUs No. of Road Crashes 2005-2007 HZ100 HZ150 HZ200 HZ100 HZ150 HZ200 548 379 286 546 386 290 186.53 213.52 230.15 201.23 230.46 248.81 1924 1505 1234 2090 1624 1,334 15,922 16,627 16,743 16,487 16,885 16,870 Offset variable: Log (Length * AADT) (Model LA’) 2002-2004 2005-2007 HZ100 HZ150 HZ200 HZ100 HZ150 HZ200 467 330 267 527 371 273 Length (km) of HZs 159.78 182.49 205.94 183.72 214.12 226.42 No. of BSUs 1,686 1,309 1,123 1,938 1,532 1,239 No. of Road Crashes 13,379 13,533 14,269 14,165 14,521 14,445 No. of HZs 120 Table 4.13 summarizes the variation of hot zones with threshold values defined by Model L, Model LA, Model LAT, Model LATJ and Model LA’ by segmentation length. The table shows the mean, standard deviation and CV values of the five types of hot zones (Model L, Model LA, Model LAT, Model LATJ and Model LA’). The indicators include number of hot zones, number of BUSs, length of hot zones and total crash count. Most CV values with 100-meter hot zones were equal to or less than those with 150-meter and 200-meter hot zones except the number of hot zones in the period of 2002-2004. The statistics suggested that 100-meter segmentation interval was more stable than 150-meter or 200-meter segmentation interval. Hence, the BUS length is better defined as 100 meters. The following analysis will be based upon the 100-meter hot zones only. Table 4.13Variation of hot zones by segmentation length 2002-2004 HZ type 100 150 200 Mean Standard deviation CV Mean Standard deviation CV Mean Standard deviation CV No. of HZs 504.20 31.16 0.06 342.20 21.86 0.06 263.60 15.92 0.06 No. of BSUs 180.46 12.80 0.07 215.07 23.52 0.11 231.90 15.54 0.07 Length of HZs 1875.80 119.61 0.06 1518.00 154.06 0.10 1244.60 74.12 0.06 No. of Crashes 16164.60 1717.07 0.11 17053.60 2297.27 0.13 17356.00 1942.50 0.11 2005-2007 HZ type 100 200 150 Mean Standard deviation CV Mean Standard deviation CV Mean Standard deviation CV No. of HZs 501.40 34.64 0.07 356.40 21.41 0.06 259.20 21.52 0.08 No. of BSUs 194.34 6.54 0.03 230.00 14.83 0.06 245.35 12.39 0.05 Length of HZs 2022.40 55.98 0.03 1624.40 93.89 0.06 1312.40 53.48 0.04 No. of Crashes 16460.00 1368.54 0.08 17266.80 1842.10 0.11 17262.80 1768.35 0.10 121 In order to examine the stability of hot zones with different models, the mean, standard deviation, and CV of hot zones of two periods are calculated for each type of model, and the results are summarized in Table 4.14. Focusing on traffic exposure which was introduced into Model LA and Model LA’, the average crash intensity of a Model-LA hot zone was 35.8 (17,210 divided by 480.5), whereas the mean value of crash intensity of Model-LA’ hot zones was only 27.7 (13,772 divided by 497). More crucially, there was greater variability between Model-LA hot zones than that between Model-LA’ hot zones. In this light, Model LA is preferable if one wants to take into consideration the traffic volume. Hence, further comparison will be made among Model L, Model LA, Model LAT and Model LATJ only. Table 4.14 Summary of 100-meter hot zones by model HZ type Model L Model LA Mean Standard deviation CV No. of HZs 485.0 32.53 0.07 No. of BSUs 195.35 1.97 0.01 Length of HZs 2027.5 24.75 0.01 1945.5 No. of Crashes 17753.5 Standard deviation CV 480.5 3.54 187.74 12.29 225.57 0.01 17219 Mean Model LATJ HZ type Model LAT Mean Standard deviation CV 0.01 504.5 14.85 0.03 0.07 188.29 7.49 0.04 122.33 0.06 1953.5 75.66 0.04 271.53 0.02 16612.5 43.13 0.00 Model LA’ Mean Standard deviation CV Mean Standard deviation CV No. of HZs 547.0 1.41 0.00 497.0 42.43 0.09 No. of BSUs 193.88 10.39 0.05 171.75 16.93 0.10 Length of HZs 2007 16204.5 117.38 399.52 0.06 0.02 1812 13772 178.19 555.79 0.10 0.04 No. of Crashes 122 Comparatively speaking, Model L is found most stable in terms of the number of BSUs and the length of hot zones, whereas Model LATJ is more robust if one focuses on the number of hot zones (demonstrated as CV in Table 4.14). When the hot zones of the two periods are overlaid, it is observed from Table 4.15 that 69.7% of Model-L hot zones in 2002-2004 were compatible with 68.8% of 2005-2007 hot zones, whereas the shares were a little smaller with Model-LATJ hot zones. Hence, considering the stability, one may choose Model L which does not incorporate any environmental variables to define the threshold values. However, the engineers may not be interested in hot zones that only reflect the variation of AADT, type of roads and the nearby road junctions, but are more concerned with hot zones resulting from other factors. Table 4.16 summarizes the comparison results of Model-L and Model-LATJ hot zones. The hot zones in L -cum- LATJ were significantly dangerous as they were identified by both models. The number of Model-L-only hot zones was 79, whereas the number of Model-LATJ-only hot zones was only 146. Model L identified more hot zones in Kowloon City and Kwun Tong, and Yuen Long became one of the most dangerous road locations when the Model LATJ was used. Which district(s) should be treated with a higher priority? For medical services allocation, Kwun Tong may be received more treatments, as hospital authorities may be more concerned with absolute number of road crashes. But policy-makers who work on safety improvement program and aim to find 123 places where road safety is most likely to be improved may be more interested in Yuen Long, since there might be more local environmental factors that cause the occurrence of road crashes in addition to the general factor that had been controlled into the models. Hence, for the identification of hazardous road locations, both types of hot zones are important in addressing road safety problems. Table 4.15 Hot zones identified by both periods HZL 330 69.7% HZLATJ 325 68.8% 66.2% 66.3% Notes: 1. Percentages of HZ24 in the HZ24-cum-HZ57 category are typed in italics 2. Percentages of HZ57 in the HZ24-cum-HZ57 category are underlined 124 Table 4.16 Comparison of Model-L and Model-LATJ hot zones by district Model L and Model LATJ Total L-cum-LATJ L-only LATJ- only 349 79 (17.1%) 146(26.7%) Urban Core Central & Western (CW) Hong Kong Island 22(6.3%) 7(8.9%) 5(3.4%) Wan Chai(WCH) Hong Kong Island 7(2.0%) 5(6.3%) 2(1.4%) Eastern (E) Hong Kong Island 21(6.0%) 6(7.6%) 11(7.5%) Yau Tsim Mong (YTM) Kowloon Peninsula 18(5.2%) 4(5.1%) 2(1.4%) Sham Shui Po (SSPO) Kowloon Peninsula 22(6.3%) 4(5.1%) 2(1.4%) Kowloon City (KC) Kowloon Peninsula 27(7.7%) 8(10.1%) 3(2.1%) Wong Tai Sin (WTS) Kowloon Peninsula 30(8.6%) 10(12.7%) 11(7.5%) Kwun Tong (KT) Kowloon Peninsula 16(4.6%) 1(1.3%) 7(4.8%) Kwai Tsing (KTS) Kowloon Peninsula 25(7.2%) 4(5.1%) 11(7.5%) Southern (S) Hong Kong Island 11(3.2%) 4(5.1%) 13(8.9%) Sha Tin (ST) The New Territories 34(9.7%) 3(3.8%) 14(9.6%) Tai Po (TP) The New Territories 28(8.0%) 5(6.3%) 14(9.6%) Tsuen Wan (TW) The New Territories 23(6.6%) 6(7.6%) 10(6.8%) Tuen Mun (TM) The New Territories 29(8.3%) 6(7.6%) 12(8.2%) Yuen Long (YL) The New Territories 24(6.9%) 5(6.3%) 15(10.3%) Sai Kung (SK) The New Territories 19(5.4%) 3(3.8%) 9(6.2%) Northern (N) The New Territories 15(4.3%) 2(2.5%) 9(6.2%) Islands (I) The New Territories 2(0.6%) 0(0.0%) 2(1.4%) Suburb 4.3 Summary This chapter introduces key steps of the link-attribute hot zone identification method. Focusing on road segmentation and threshold value definition, a series of sensitivity analyses are performed. The results indicate that dissolved road system can detect more hazardous road locations and is more stable than the raw-link-node road network. While using an arbitrary 125 number to define the threshold value is a simple and effort-saving method for the identification of hot zones, employing a statistical method such as Monte Carlo simulation can avoid selection bias in choosing an appropriate number as the threshold value, because the Monte Carlo method employs the significance levels such as “95%” and “99% to define the threshold values regardless of the total number of road crashes that have happened. The hot zones are sensitive to segmentation length and predictors. As the variability is smaller among 100-meter hot zones, the segmentation length is better defined as 100 meters. The hazardous road locations identified by different “crash-potential” models may differ significantly, but all of them are important in providing important information for improving road safety. 126 CHAPTER 5 LINK-ATTRIBUTE ANALYSIS FOR ROAD CASUALTIES This chapter performs link-attribute analysis for pedestrian casualties. The importance of casualty-based analysis and the reason why the pedestrian casualties are targeted briefly discussed in the first section. The ways in which the weight for each type of injury is determined are introduced in Section 5.3. The data and the analytical tools for district-wide identification of hazardous road locations for pedestrians are then presented in details in Section 5.4. In particular, the pedestrian casualties are analyzed by incorporating the surrounding environment of road crashes with a model-based approach. 5.1 Importance of Analysis based on Casualties This section discusses the importance of casualty-weighted analysis and the reason for choosing pedestrian casualties as study population. 5.1.1 Importance of Casualty-Weighted Analysis Medical, social and other costs of traffic crashes are more closely related to the number of casualties, rather than the number of crashes. For instance, fatal and severely injured victims require the timely dispatch of ambulances 127 and medical treatments of trauma teams in hospitals. Weighting each of road crashes by the number of casualties involved would be helpful for the hospitals to understand the spatial distribution of casualties and thus appropriately allocate emergency services. Figure 5.1 shows the numbers of road traffic crashes and casualties during 2001 to 2010 in Hong Kong. There were about 15,000 road crashes in each year, whereas the number of traffic casualties reached approximately 20,000, with 33.3% more than the crash count. It is hence worthy of examination of casualty-based hot zones in Hong Kong. 25000 20000 15000 10000 5000 0 2001 2002 2003 2004 2005 Road Crashes 2006 2007 2008 2009 Road Casualties Figure 5.1 Numbers of road crashes and casualties during 2001 to 2010 128 2010 5.1.2 Targeted Casualties: Pedestrians Pedestrians are regarded as one of the most vulnerable road user types. In many low-income and model-income countries, the share of pedestrian fatalities in total road deaths is notoriously serious, due largely to the increase in motorization and poor infrastructure which lacks separation of pedestrians from other road users. Even in high-income and motorization countries where people may drive more than walk, the percentage of pedestrian fatalities is still relatively high. According to a recent statistical report by European Commission (2010), in the year of 2008, 7,638 pedestrians died in road traffic crashes in 23 European countries, accounting for over 20% of road traffic fatalities in these countries. In Hong Kong, pedestrians are at high crash risks. Table 4.1 shows Hong Kong’s road traffic casualties during the period from 2001 to 2010. The number of casualties fluctuated between 19,000 and 21,000. Although the shares of pedestrians in total injuries were decreasing slightly from 25% to 20% in recent years, pedestrians were still the most vulnerable road user group in terms of fatal and serious injuries. For instance, there were 117 fatalities and 2,160 serious injuries in 2010, of which 59% and 34.9% represented pedestrians. Figure 5.2 and Figure 5.3 show the shares of fatalities and serious injuries by road user type. The rate of fatal and serious injuries in pedestrian 129 casualties was significantly higher than those in driver and passenger casualties. Such alarming situation merits the academic and public’s concern to the safety of pedestrian. Hence, the causality-based analysis in this thesis will focus on pedestrian victims. As people are more likely to be aware of pedestrian hot zones in their vicinity, hot zones for pedestrians would be highly related to road users at the local level. In this light, the spatial distribution of pedestrian casualties will be analyzed at district level. 130 Table 5.1 Road Traffic Casualty Statistics in Hong Kong by Road User Type, 2001-2010 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2001-2010 Pedestrian 4978(24.5%) 4805(23.3%) 4517(24.7%) 4577(23.6%) 4401(22.9%) 4230(22.4%) 4077(20.8%) 3823(20.5%) 3583(19.8%) 3898(20.4%) 4288.9(22.3%) Driver 8095(39.8%) 8315(40.4%) 7689(42.0%) 8102(41.8%) 8251(43.0%) 8327(44.1%) 8879(45.3%) 8382(44.9%) 8384(46.2%) 8730(45.6%) 8315.4 (43.3%) Passenger 7244(35.7%) 7473(36.3%) 6104(33.3%) 6723(34.7%) 6554(34.1%) 6311(33.4%) 6662(34.0%) 6479(34.7%) 6171(34.0%) 6496(34.0%) 6621.7 (22.3%) 20317 20593 18310 19402 19206 18868 19618 18684 18138 19124 19226 Pedestrian 97(56.1%) 86(50.3%) 99(49.0%) 96(57.8%) 78(51.7%) 78(54.2%) 91(57.2%) 88(54.3%) 71(51.1%) 69(59.0%) 85.3(53.9%) Driver 51(29.5%) 57(33.3%) 66(32.7%) 51(30.7%) 48(31.8%) 48(33.3%) 51(32.1%) 46(28.4%) 41(29.5%) 37(31.6%) 49.6(31.3%) Passenger 25(14.5%) 28(16.4%) 37(18.3%) 19(11.4%) 25(16.6%) 18(12.5%) 17(10.7%) 28(17.3%) 27(19.4%) 11(9.4% 23.5(14.8%) 173 171 202 166 151 144 159 162 139 117 158.4 Pedestrian 1332(37.9%) 1232(36.0%) 1069(36.2%) 981(35.0%) 967(36.0%) 908(36.2%) 882(34.9%) 782(34.3%) 723(34.5%) 744(34.4%) 962(35.7%) Driver 1382(39.3%) 1419(41.4%) 1203(40.8%) 1210(43.2%) 1178(43.8%) 1056(42.1%) 1120(44.3%) 956(41.9%) 922(44.0%) 985(45.6%) 1143.1(42.4%) Passenger 803(22.8%) 774(22.6%) 679(23.0%) 612(21.8%) 543(20.2%) 542(21.6%) 528(20.9%) 543(23.8%) 451(21.5%)) 431(20.0% 590.6(21.9%) 3517 3425 2951 2803 2688 2506 2530 2281 2096 2160 2695.7 Casualties Total Fatalities Total Serious Injuries Total 131 2.5% 2.0% 1.5% Pedestrian Driver 1.0% Passenger 0.5% 0.0% 2001 2002 2003 2004 2005 2006 2007 2008 2009 Figure 5.2 Percentages of fatalities by road user type, 2001-2010 132 2010 2001-2010 30.0% 25.0% 20.0% Pedestrian 15.0% Driver Passenger 10.0% 5.0% 0.0% 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2001-2010 Figure 5.3 Percentages of fatal and seriously injured casualties by road user type, 2001-2010 133 5.2 Casualty-Weighted Analysis Road traffic casualties can be classified by systems such as Abbreviated Injury Scale or the Injury Severity Scale (ISS). Tsui et al (2009) has used ISS to describe the severity of injuries in Hong Kong. However, this thesis categorizes the casualties based on police reports as the ISS system has been employed only in some pilot studies. Hence, the casualties are categorized into three types by severity of injury, namely fatalities, serious injuries and slight injuries. While the unweighted method treats each casualty equally regardless of its injury type, the cost-weighted method assigns different weights to different severity levels. The section presents the way in which crash intensity is measured by these two methods. 5.2.1Unweighted With an unweighted approach, the link-attribute crash intensity LCIuw(i) in BSU i is defined as: LCIuw(i ) Fa (i ) Se(i ) Sl (i ) 5.1 where i=1,2,3…n; n is the number of BSUs; Fa is the number of fatalities; Se is the number of serious injuries; and Sl is the number of slight injuries. 134 5.2.2Cost-Weighted Valuation of the prevention of a traffic casualty is important to ranking economic viability of transport schemes and allocation of scarce recourses (iRAP, 2007). Research into the benefit of preventing a road crash fatality which is measured by the Value of a Statistical Life (VSL) has been pursued for a long history (Ashenfelter, 2006; de Blaeij et al., 2003; Elvik, 1997; Miller, 2000). Several techniques have been employed to estimate VSL, including Human Capital, Willingness-to-pay (WTP), Stated Preference and Reveled Preference approaches. As WTP is generally acknowledged to be the most valid methodology (iRAP, 2007), a growing number of countries have preferred WTP in recent years. The approach estimates the value of preventing a fatality by estimating the amount of money that individuals would be prepared to pay to reduce the risk of loss of life (iRAP, 2007). While some studies have been conducted to establish WTP values for traffic fatalities, those that have values for serious or slight injuries were limited. This is mainly because of problems on designing questionnaires to elicit reliable WTP estimates (iRAP, 2007). Nevertheless, there are some developed countries attempting to employ the WTP method to estimate the costs of non-fatal casualties. Table 5.2 lists casualty costs in some developed countries using the WTP approach. On average, the costs of a serious injury and a slight injury are approximately 16% and 1% of the value of a fatality respectively. With the 135 absence of casualty cost data in Hong Kong, this research is based on these average ratios and defines the cost-weighted link-attribute crash intensity as: LCIcw(i ) 100Fa (i ) 16Se(i ) Sl (i ) 5.2 where i=1,2,3…n; n is the number of BSUs; Fa is the number of fatalities; Se is the number of serious injuries; and Sl is the number of slight injuries. Table 5.2 Casualty cost in developed countries using willingness-to-pay approach Year Currency Cost Ratio Fatal Serious Slight Serious/ Fatal Slight/ Fatal Austria 2006 € 2,676,374 316,772 22,722 11.9% 0.8% New Zealand 2006 NZ$ 3,065,000 535,000 60,000 17.5% 2.0% Singapore 2008 S$ 1,874,000 243,600 18,740 13.0% 1% Sweden 2005 SK 18,383,000 3,280,000 N/A 17.8% N/A United Kingdom 2006 £ 1,489,163 167,332 12,898 11.2% 0.9% United States 2007 $ 5,800,000 1,322,300 30,920 22.7% 0.5% 15.6% 1.0% Average Sources: iRAP (2007); Le et al.(2011); Ministry of Transport, New Zealand (2010); Department for Transport, UK(2007); Szabat and Knapp (2009);GmbH (2008) 5.3 District-Wide Identification of Hazardous Road Locations for Pedestrians The pedestrian casualties in a selected district are analyzed by two methods in defining threshold values. One is the simple ranking method which only focuses on the observations without considering any crash predictors. The other is a model-based approach which defines threshold 136 values by incorporating the surrounding environment of crashes. 5.3.1Data Description 5.3.1.1 Study area In order to focus on a manageable but reasonably complex setting, Kwun Tong District of Hong Kong is selected as the study area. It is situated at the eastern part of the Kowloon Peninsula. The district is chosen for investigation of pedestrian casualties as it was among the top three populous districts (562,427 in 2001 and 587,423 in 2006) and ranked the first place in terms of population density (4.9 persons per 100 sq. m. in 2001 and 5.2 persons per 100 sq. m. in 2006) among 18 districts of the city (see the population density map of 2006 in Figure 5.4 as an example) based on the 2001 and 2006 Census data (Census and Statistics Department 2002, 2007). 137 Figure 5.4 Population Density in 2006 5.3.1.2 Road network The entire road network of the district is used for casualty-based analysis. In Kwun Tong District, the road network (see Figure 5.5) is approximately 157 km in length, with 1,303 links and 1,025 nodes. The raw link-node system is firstly dissolved with the dissolving algorithm. After segmenting the dissolved road system (487 links) with 100-meter interval, 1,852 BSUs are obtained. Table 5.3 shows descriptive statistics on length of BSUs based on raw and dissolved road system. The variation of length of BSU is significantly decreased when the dissolved road system is used. The following analysis will use BSUs 138 derived from the dissolved road system. Figure 5.5 Road Network in Kwun Tong District Table 5.3 Statistics on length of BSU before and after dissolving performance Before dissolving After dissolving 5.3.1.3 Std. Deviation (m) Quartiles (m) 50 25 No. Mean (m) 2,382 68.9 33.8 38.0 81.2 100 1,852 84.8 23.5 80.2 100 100 75 Pedestrian casualties There were altogether 944 and 940 pedestrian casualties in Kwun Tong District during the periods of 2002-2004 and 2005-2007 respectively. The 139 numbers and percentages of fatal, seriously and slightly injured victims are shown in Table 5.4. During these two periods, the fatal and seriously injured casualties accounted for about 25% of the total injuries. Hence, it is worthwhile to investigate pedestrian casualties by severity type in Kwun Tong District. Table 5.4 Numbers and Percentages of fatalities, serious and slight pedestrian injuries Fatality Serious injury Slight injury Total 2002-2004 22 (2.3%) 213(22.6%) 709 (75.1%) 944 2005-2007 21(2.2%) 242(25.7%) 677(72.0%) 940 The three types of pedestrian injuries are aggregated by BSU. Table 5.5 presents descriptive statistics on pedestrian casualties by injury type. The observations suggest that the spatial distributions of pedestrian casualties in 2002-2004 and 2005-2007 could be similar. In both periods, more than 50% of BSUs had no crashes happening on them. The variation in number of fatalities is relatively small with maximum number of 2 and standard deviation equal to about 0.11 in both periods, whereas the slight injury count varied greatly with a range from 0 to 11 (standard deviation equal to 1.05) in the period of 2002-2004 and from 0 to 8 (standard deviation equal to 0.92) in the period of 2005-2007. 140 Table 5.5 Descriptive statistics on pedestrian casualties Mean Median Std. Deviation Minimum Maximum 0.51 0.00 1.33 0 14 0.01 0.00 0.11 0 2 0.12 0.00 0.45 0 7 0.38 0.00 1.05 0 11 0.51 0.00 1.22 0 12 Number of fatalities 0.01 0.00 0.12 0 2 Number of serious injuries Number of slight injuries 0.13 0.00 0.45 0 6 0.37 0.00 0.92 0 8 2002-2004 Number of casualties Number of fatalities Number of serious injuries Number of slight injuries 2005-2007 Number of casualties 5.3.1.4 Surrounding environment Surrounding environmental variables include road environmental variables such as road junctions, land use variables such as land mix index, socio-economic environmental factors such as social and economic deprivation level, and demographic predictors like population density. According to the spatial scales at which attributes are aggregated, these variables can also be categorized into BSU-level and TPU-level types. BSU-level environmental variables As it is generally accepted that most of road crashes involving pedestrians happen around road junctions, the number of junctions on each BSU might explain the variation of occurrence of traffic crashes. To measure this variable, the road junctions are aggregated by BSU. 141 TPU-level environmental variables The characteristics of the neighborhoods may also have great impacts on the incidence of traffic crashes. The Tertiary planning units (TPUs) are used in this study to capture the land use, demographic and socio-economic conditions of the community. For each TPU-based variable, all the BSUs within a same TPU were assigned the same value. As described in Section 3.4.1.4, land use data of 2004 and 2006 will be used for measuring land use mix in the two periods. Land use mix is calculated based on Simpson Diversity Index. It is a biological diversity measurement which evaluates the number of land use categories within a neighborhood. Following the formal expression of the index, the land use diversity, denoted as LUD, can be calculated by: ni (ni 1) LUD N ( N 1) (5.3) where N is the total number of land use categories and n i is the number of patches of land in the ith category. As described in the second chapter, this research broadly divided the land use into five categories, namely residential, commercial, industrial, institutional and others. The socio-economic deprivation index (SDI) describes the deprivation level of neighborhoods. As reviewed in the literature, increasing deprivation 142 level may increase the chance of being a victim in road crashes. Hence, SDI is used as one dimension of “prior information”. The way in which SDI is calculated by TPU can be found in Section 3.4.1.5. Greater population density may increase the exposure to collisions. Hence, the population is also introduced as an explanatory variable. The complexity of the road structure in the vicinity might impact the occurrence of road crashes. This study treats the road junction density as another independent variable. 5.3.2 Data Analysis 5.3.2.1 Simple ranking As no numerical definitions have been used for casualty-based spatial analysis of road crashes in Hong Kong, this research will employ a simple ranking method which regards certain percentile (90%, 95% or 99%) of the observations as the threshold value. The advantage of this method is that one can roughly control the number of hot BSUs. Unweighted Crash intensities for the unweighted analysis, LCIUW, are calculated by following Equation 5.1. The threshold values are defined as 90%, 95% and 99% percentiles of LCIUW. 143 Table 5.6 Percentiles of crash intensity for the unweighted analysis 2002-2004 2005-2007 Percentile 90% 95% 99% 90% 95% 99% Value 2 3 7 2 3 5 Cost-weighted Crash intensities for the cost-weighted analysis, LCICW, are calculated by following Equation 5.2. The threshold values are defined as 90%, 95% and 99% percentiles of LCICW. Table 5.7 Percentiles of crash intensity for the cost-weighted analysis 2002-2004 2005-2007 Percentile 90% 95% 99% 90% 95% 99% Value 4 17 100 16 18 100 5.3.2.2 Incorporation of the surrounding environment of crashes As mentioned in Chapter Two, the concentration of road crashes is not driven by a single force like traffic exposure, but by a synthesis of various determinants such as characteristics of road structure and land use type. Treating road crashes in relation with their surrounding environmental context, including street and land use characteristics as well as demographic and socio-economic environment could better understand the spatial distribution of road crashes and could predict their occurrence. The expected crash intensity can be estimated by a statistical model with these environmental factors. Using expected intensity as the threshold value, crash 144 hot zones will be identified in a “potential crash reduction” manner. Unweighted In unweighted analysis, the crash intensity LCIUW is determined as the number of traffic casualties which can be calculated with Equation 4.1. Taking into account environmental variables, the threshold value, t uw, is defined as expected number of casualties which is estimated by a regression model. The descriptive statistics on pedestrian casualties in Table 5.5 has demonstrated that the traffic casualties (LCIUW) were over dispersed with mean much lower than its variance. Hence, negative binomial models are used to estimate the expected number of pedestrian casualties (t uw). It should be pointed out that the length of BSU is still varied even though the dissolving algorithm is performed on the raw-road link system. As shown in Table 5.3, the mean value of BSU length is 84.8 m with standard deviation equal to 23.5 m after segmentation with a dissolved road network. The length of BSU will be hence introduced into models as an explanatory variable. Two kinds of negative binomial models are used for investigating the sensitivity of crash hot zones to the selected surrounding environmental variables. One could be treated as a base model which included the length of BSU as its only explanatory variable. The other, regarded as a full model, introduced all of surrounding environmental variables, including length of BSU, the number of junctions on 145 the BSU, land mix, SDI, population density and junction density. For both types of models, the dependent variable is the crash intensity LCIUW. Table 5.9 shows the coefficient and p-value of length as well as over-dispersion test in base models, which indicates that the length of BSU was positively and significantly associated with the number of pedestrian casualties on BSUs and the distribution of road crashes is better modeled by a negative binomial model. The tuw is then estimated by the base model. Table 5.8 Link-attribute results on base negative binomial models for pedestrian casualties with length as independent variable 2002-2004 2005-2007 Parameter B Sig. B Sig. (Intercept) -3.561 0.000 -3.425 0.000 Length 0.031 0.000 0.030 0.000 Log likelihood -1583 -1621 Akaike's Information Criterion 3172 3248 alpha Likelihood ratio test of alpha=0 (Sig) 3.9 3.2 957.2 (0.000) 739.9 (0.000) The results on full models with six independent variables are shown in Table 5.9. In both periods, increasing length of BSU, the number of road junctions, land mix and SDI could significantly increase the chance of being a pedestrian victim. As the p-values of population density and junction density were large in both periods, the two variables are excluded in the final full models. The coefficient and p-value of each variable in the final full models are shown in Table 5.10. The tuw is estimated by the final full model with four 146 independent variables. Table 5.9 Link-attribute results on full negative binomial models for pedestrian casualties with six independent variables 2002-2004 Parameter (Intercept) Length Number of junctions Land Mix SDI Population density Junction density 2005-2007 B Sig. B Sig. -5.470 0.000 -5.016 0.000 0.038 0.000 0.035 0.000 0.356 0.000 0.278 0.000 0.018 0.000 0.022 0.000 0.141 0.000 0.299 0.000 3.049E-6 0.336 -1.503E-6 0.603 -.016 0.844 -0.033 0.655 Log likelihood -1528 -1560 Akaike's Information Criterion 3067 3136 alpha Likelihood ratio test of alpha=0 (Sig.) 2.93 2.35 719.0(0.000) 571.4(0.000) Table 5.10 Link-attribute results on full negative binomial models for pedestrian casualties with four independent variables 2002-2004 Parameter (Intercept) Length Number of junctions Land Mix SDI 2005-2007 B Sig. B Sig. -5.523 0.000 -5.203 0.000 0.037 0.000 0.035 0.000 0.353 0.000 0.279 0.000 0.021 0.000 0.019 0.000 0.000 0.267 0.158 0.000 Log likelihood -1540 -1580 Akaike's Information Criterion 3070 3141 alpha Likelihood ratio test of alpha=0 (Sig.) 2.92 2.41 759.1(0.000) 597.4 (0.000) Cost-weighted In cost-weighted analysis, the crash intensity LCIcw is defined as a cost-weighted score which is calculated by Equation 5.2. The threshold value 147 tcw is determined as: tcw(i ) 100EFa (i ) 16ESe(i ) ESl(i ) (5.4) where EFa(i), ESE(i) and Esl(i) are expected numbers of fatalities, serious and slight injuries which are calculated by negative binomial models. The results of base models are shown in Table 5.11. The alpha and the likelihood ratio test demonstrate that the negative binomial regression model is better than Poisson model. The length of BSU had significant relationship with the numbers of seriously injured and slightly injured pedestrian casualties (p-values less than 0.001). The p-values of BSU length in the fatal pedestrian casualty models were more than 0.05 (0.072 in the 2002-2004 model and 0.078 in the 2005-2007 model), but they can still be regarded as relatively small. Hence, the variable was also included in the model in order to be consistent with seriously injured and slightly injured pedestrian casualty models. The EFa, ESE and Esl are then estimated by the three base models respectively. The base-model tcw is then calculated by following Equation 5.4. 148 Table 5.11 Link-attribute results on base negative binomial models for pedestrian casualties by injury type 2002-2004 2005-2007 B Sig. B Sig. Dependent variable : Number of fatal pedestrian casualties (Intercept) -8.326 0.000 -7.839 0.000 Length Log likelihood 0.042 0.072 0.036 0.078 Akaike's Information Criterion alpha Likelihood ratio test of alpha=0 (Intercept) Length Log likelihood (Intercept) Length Log likelihood -111 237 229 14.9 7.1(0.004) Dependent variable :Number of seriously injured pedestrian casualties -4.986 0.000 -4.858 0.000 5.6 4.2 (0.017) 0.031 Akaike's Information Criterion alpha Likelihood ratio test of alpha=0 -115 0.000 0.031 0.000 -608 -708 1323 1422 4.8 3.3 120.1(0.000) 87.4(0.000) Dependent variable: Number of slightly injured pedestrian casualties -3.831 0 -3.692 0 0.031 0 0.029 0 -1354 -1353 Akaike's Information Criterion 2715 2712 alpha Likelihood ratio test of alpha=0 3.9 630.3(0.000) 3.2 435.0(0.000) 149 As only length of BSU, number of junctions on BSU, land mix and SDI are found to be significantly associated with the incidence of pedestrian casualties (see Table 5.9 and Table 5.10), the full negative binomial models only introduce these four environmental factors as independent variables. The results on full models for fatal, seriously injured and slightly injured pedestrian casualties are shown in Table 5.12. The over-dispersion tests indicate that negative binomial models are better than Poisson models regardless of the dependent variable. For both periods, the four variables were positively and significantly related with the number of seriously and slightly injured pedestrian casualties. However, the relationship between predictors and the occurrence of fatal pedestrian casualties was not as close as that with serious and slight injuries (see p-values in Table 5.12). In particular, the p-values of BSU length and SDI in the 2002-2004 fatality model were more than 0.05. Nonetheless, given that the p-values were less than 0.05 in the 2005-2007 model and smaller than 0.1 in the 2002-2004 model, the two variables were still chosen as the independent variables in computing expected number of fatal pedestrian casualties so as to be consistent with serious and slight injury models. The EFa, ESE and Esl are then estimated by the three full models respectively. The full-model tcw is then calculated by following Equation 4.4 150 Table 5.12 Link-attribute results on full negative binomial models for pedestrian casualties by injury type 2002-2004 2005-2007 B Sig. B Sig. Dependent variable : Number of fatal pedestrian casualties (Intercept) -9.566 0.000 -11.045 0.000 Length Number of junctions 0.045 0.060 0.047 0.034 0.536 0.029 0.512 0.021 Land Mix 0.024 0.049 0.03 0.007 SDI Log likelihood 0.039 0.089 0.379 0.010 -113 -102 Akaike's Information Criterion 238 219 alpha Likelihood ratio test of alpha=0 (Sig.) 5.6 7.2 2.6 (0.047) 4.4 (0.018) Dependent variable :Number of seriously injured pedestrian casualties (Intercept) -7.182 0.000 -7.01 0.000 Length 0.038 0.000 0.038 0.000 Number of junctions Land Mix 0.381 0.000 0.359 0.000 0.018 0.000 0.024 0.000 SDI 0.212 0.000 0.257 0 Log likelihood Akaike's Information Criterion alpha Likelihood ratio test of alpha=0 (Sig.) -608 -672 1229 1357 3.5 2.1 92.6(0.000) 56.1(0.000) Dependent variable: Number of slightly injured pedestrian casualties (Intercept) Length -5.772 0 -5.33 0 0.037 0 0.034 0 Number of junctions 0.347 0 0.238 0 Land Mix SDI 0.023 0 0.017 0 0.143 0 0.264 0 Log likelihood -1306 -1309 Akaike's Information Criterion alpha Likelihood ratio test of alpha=0 (Sig.) 2625 2631 2.9 2.4 451.6(0.000) 325.1(0.000) 151 5.3.3Empirical Bayes EB approach is regarded as a promising technique which has been applied to many applications on the identification of crash hot spots. It has great advantages over conventional methods in dealing with regression-to-the-mean problem in which locations with a randomly high number of road crashes is falsely identified as hazardous locations and vice versa (Persaud, Lyon & Nguyen, 1999). The essence of the EB approach is to smooth out the random fluctuation in crash records by specifying the safety of a location with an estimate of long-term mean ( m) instead of its observed short-term crash count (Persaud, Lyon & Nguyen, 1999). The m is estimated by combining the observed crash counts and an estimate of the expected number of road crashes as: 1 1 m( ) (1 )x 1 / k 1 / k (5.5) where λ is the expected number of road crashes estimated by a statistical model; k is the inverse value of the over-dispersion parameter which can be calibrated by the crash predication model and x is the observed crash counts (Elvik, 2007). In hot spot identification, the value of m is used in two ways (Elvik, 2007). One is used directly to rank locations which have high value of m. The other is to identify hot spots in terms of crash potential reduction 152 which is defined as the difference between m and λ . Sources of variation of the road crashes can then be classified into three types (Elvik, 2007) which are: (1) General factors, included in the crash prediction model; (2) Random variation, the excess of observed crash counts to the EB estimate; and (3) Local factors (and unknown or unmeasured general factors), also named dispersion factors, the difference between the EB estimate and the model estimate. To investigate the use of EB technique in the identification of crash hot zones, the EB approach will be applied to the identification of hazardous road locations for pedestrians. Using the unweighted casualty-based approach, the EB-based crash intensity, LCIEB, can be calculated by: 1 1 LCIEB ( ) p (1 )C 1 p / k 1 p / k (5.6) where p is the expected number of pedestrian casualties estimated by a statistical model; k is the inverse value of the over-dispersion parameter which can be calibrated by the casualty predication model and C is the observed pedestrian casualties. The negative binomial full model will be used to estimate the expected number of pedestrian casualties (p ), which introduces length and road junction counts of a BSU, and Land Mix index and SDI index of a TPU as 153 the independent variables. The EB estimate will also be used in two ways, namely simple ranking and potential for casualty reduction methods. 5.3.3.1 Simple ranking In simple-ranking identification, the crash intensity is measured by the EB-estimate and the threshold value is determined by the percentiles of the EB estimate. Table 5.13 shows the 90%, 95% and 99% percentiles of EB estimates of 2002-2004 and 2005-2007 periods. Table 5.13 Percentiles of EB estimates for unweighted analysis 2002-2004 2005-2007 Percentile 90% 95% 99% 90% 95% 99% Value 1.4 2.4 5.4 1.5 2.2 4.6 5.3.3.2 Safety potential In safety potential identification, the crash intensity is measured by the EB-estimate and the threshold value is determined by the pedestrian-casualty prediction model. The value of the threshold is equal to p in Equation 5.6. 154 5.4 Results 5.4.1 Simple Ranking Table 5.14 shows the results on pedestrian hot zones with threshold values defined by simple ranking method. If the two casualty-based methods are compared, the statistics suggest that unweighted method could identify longer hot zones with greater number of casualties, whereas the cost-weighted method identified hazardous road locations with high concentration of fatal and serious injured victims. If the two types of hot zones are overlaid by GIS, the hot zones could be classified into three types, namely hot zones identified by both unweighted and cost-weighted approaches, hot zones identified only by the unweighted method and hot zones identified only by the cost-weighted approach. Table 5.15 shows the statistics on hot zones identified by only one approach. The results suggest that more than 50% of hot zones identified by the unweighted and cost-weighted approaches could be compatible when 90% and 95% percentiles were used, but the compatibility was sharply reduced when the threshold value was increased to the 99% percentile. If one closely examines the one-approach-only hot zones, they could find that the hot zones identified only by the unweighted approach were characterized as high concentration of slight injuries with no or very limited number of fatalities and serious injuries whereas the hot zones identified only by the 155 cost-weighted approach had greater number of fatal and seriously injured pedestrian casualties. Figure 5.6 delineates the locations of hot zones identified only by one approach with threshold values defined by 95% percentile of crash intensity. The bar charts describe the numbers of fatalities, serious injuries, slight injuries and total casualties in each hot zone. There were four hot zones identified only by the unweighted method. The number of pedestrian casualties happening on these hot zones ranged from 6 to 18. Among these victims, none were fatally injured and only one was seriously injured in two hot zones respectively. Five hot zones were only detected by the cost-weighted approach. The total number of pedestrian casualties in each cost-weighted-only hot zone was less than 10, but the fatalities and serious injuries accounted for a larger share of pedestrian casualties with greater numbers than those in unweighted-only hot zones. 156 Table 5.14 Statistics on link-attribute pedestrian hot zones with threshold value defined by the simple ranking method Unweighted 2002-2004 HZ type 2005-2007 90% 95% 99% 90% 95% 99% 34 18 2 33 19 7 14.15 6.64 0.40 13.87 6.63 1.97 Number of BSUs 145 68 4 142 68 20 Number of pedestrian casualties 564 367 30 505 327 146 9 8 1 13 10 8 125 71 10 133 87 39 Number of HZs Length of HZs (km) Number of fatalities Number of serious injuries Cost-weighted 2002-2004 HZ type 2005-2007 90% 95% 99% 90% 95% 99% 28 19 0 37 17 2 10.47 5.77 0 10.64 5.08 0.37 Number of BSUs 109 59 0 110 52 4 Number of pedestrian casualties 434 263 0 370 265 30 Number of fatalities 13 5 0 16 13 5 Number of serious injuries 123 89 0 147 87 5 3566 2093 0 4159 2857 600 Number of HZs Length of HZs (km) Cost Table 5.15 Link-attribute hot zones identified only by the unweighted or cost-weighted method 2002-2004 90% UW-only 13 (38.24%) 95% CW-only 5 (17.86%) UW-only CW-only 4 5 (22.22%) (26.32%) 2005-2007 90% UW-only 13 (39.39%) 99% UW-only 2 (100%) 95% CW-only 13 (35.14%) UW-only 6 (31.58%) 99% CW-only 3 (17.65%) 157 CW-only 0 (-) UW-only 5 (71.43%) CW-only 0 (0%) (a) (b) Figure 5.6 Hot zones identified only by (a) equal-weighing and (b) cost-weighted method 158 To investigate the stability of the performance of the two approaches, the hot zones of two periods are compared in terms of the number and length of hot zones, the number of BSUs, crash intensity represented by the number of pedestrian casualties for unweighted approach and the total costs for the cost-weighted approach. It is observed that the variability of either the unweighted or the cost-weighted approach significantly increased when the threshold value was raised from 90% percentile to 99% percentile of the observations. Taking the unweighted hot zones as an example, the mean and the standard deviation of the number of 90% hot zones was 33.5 and 0.71 with CV value equal to 0.02, whereas the average number of 99% hot zones was 4.5 with standard deviation and CV equal to 3.54 and 0.79. As demonstrated by the CV values, there was great variability among 99% hot zones, which increase the chance of false positive problems. Moreover, all of CV values with cost-weighted hot zones were greater than those with equal-weighing hot zones, indicating that the performance of the unweighted approach is more stable than the cost-weighted approach. 159 Table 5.16 Variation of hot zones between two periods 90% Mean UW Standard deviation Mean CW Standard deviation CV CV No. of Hot Zones 33.50 0.71 No. of BSUs 0.02 32.50 6.36 0.20 143.50 2.12 0.01 109.50 0.71 0.01 Crash Intensity 534.50 41.72 0.08 3862.50 419.31 0.11 Length of Hot Zones 14.01 0.20 0.01 10.56 0.12 0.01 CV 95% Mean UW Standard deviation CV Mean CW Standard deviation No. of Hot Zones 18.50 0.71 0.04 18.00 1.41 0.08 No. of BSUs 68.00 0.00 0.00 55.50 4.95 0.09 Crash Intensity 347.00 28.28 0.08 2475.00 540.23 0.22 6.64 0.01 0.00 5.43 0.49 0.09 Length of Hot Zones 99% Mean UW Standard deviation Mean CW Standard deviation CV CV No. of Hot Zones 4.50 3.54 0.79 1.00 1.41 1.41 No. of BSUs 12.00 11.31 0.94 2.00 2.83 1.41 Crash Intensity Length of Hot Zones 88.00 82.02 0.93 300.00 424.26 1.41 1.19 1.11 0.94 0.19 0.26 1.41 5.4.2 Incorporation of the Surrounding Environment of Crashes Involving Pedestrians Table 5.17 summarizes the hot zones identified with the threshold value determined by incorporating surrounding environment of traffic crashes involving pedestrians. It is observed that the cost-weighted method identified more hot zones than unweighted method with larger number of casualties and longer length. Taking the Base-Model (BM) as an example, the cost-weighted 160 approach identified 48 hot zones with 19.19 km in the period from 2002 to 2004 but the unweighted approach detected only 39 hot zones with total length equal to 15.56 km. Comparing the Base-Model with the Full-Model (FM) approaches, one may find that the Full-Model approach could detect more hot zones than the Based-Model approach, especially with cost-weighted hot zones. For instance, the numbers of Based-Model hot zones were 48 (19.19 km) in 2002-2004 and 50 (20.79 km) in 2005-2007 whereas the numbers of Full-Model hot zones were 66 (23.84 km) and 60 (23.52 km) respectively when the cost-weighted approach is employed. Table 5.17 Characteristics of hot zones identified by incorporating surrounding environment 2002-2004 Hot Zone Type Unweighted Base Model Full Model Cost-weighted Base Model Full Model 39 44 48 66 Length (km) 15.56 14.18 19.19 23.84 No. of BSUs 167 150 206 250 No. of Pedestrian Casualties 595 535 649 677 - - 4390 4829 No. of Hot Zones Cost 2005-2007 Hot Zone Type Unweighted Base Model Full Model Cost-weighted Base Model Full Model 35 45 50 60 Length (km) 15.25 15.07 20.79 23.52 No. of BSUs 164 159 221 246 No. of Pedestrian Casualties 537 503 624 624 - - 5067 4878 No. of Hot Zones Cost 161 To further analyze the spatial distribution of hot zones, the unweighted and cost-weighted hot zones are first overlaid. Table 5.18 summarizes the hot zones identified only by the unweighted (UW-only) or the cost-weighted approach (CW-only). In both periods, there were no UW-only hot zones but some CW-only hot zones no matter whether the Base Model or the Full Model was employed to determine the threshold value. Taking the Base-Model hot zones in the period from 2002 to 2004 as an example, none of UW hot zones belonged to the UW-only type, but 18.75% of CW hot zones could not be compatible with UW hot zones. This reflects that hot zones identified by the unweighted method could also be identified by the cost-weighted approach but not all of cost-weighted hot zones can be identified by the unweighted approach. Hence, one may only use the cost-weighted approach to identify hazardous road locations for pedestrians first and then investigate the detailed information on each type of pedestrian injury if s/he is interested in not only the concentration of pedestrian casualties of all injury type but also the clusters of pedestrian fatalities and serious injuries. The following analysis will be based on the cost-weighted hot zones only. 162 Table 5.18 Link-attribute hot zones identified only by the unweighted or the cost-weighted method 2002-2004 Base Model Full Model UW-only CW-only UW-only CW-only 0 (0%) 9 (18.75%) 0 (0%) 21(31.82%) 2005-2007 Base Model Full Model UW-only CW-only UW-only CW-only 0 (0%) 15 (30.00%) 0 (0%) 18 (30%) To examine the difference between the Base-Model and the Full-Model hot zones, the two types of hot zones are overlaid by GIS. Table 5.19 summarizes the hot zones identified only by the Base-Model (BM-only) or the Full-Model approach (FM-only). The shares of the BM-only and FM-only hot zones demonstrate that most of BM and FM hot zones were compatible. Nevertheless, there were still 3 BM-only and 13 FM-only hot zones in the period from 2002 to 2004 and 4 BM-only and 12 FM-only hot zones in the period of 2005-2007. Which locations (see hot zones in Figure 5.7 ) were more dangerous? While those who were more concerned with locations with higher concentration of pedestrian casualties would pay greater attention to the BM-only hot zones (colored in purple), engineers who would like to detect pedestrian hot zones which were not identified due only to the nearby road junctions might dedicate more efforts to FM-only hot zones (colored in pink). 163 Table 5.19 Link-attribute hot zones identified only by the base-mode or the full-model approach 2002-2004 2005-2007 BM-only FM-only BM-only FM-only 3 (6.25%) 13 (19.70%) 4 (8%) 12(20%) (a) (b) Figure 5.7 FM-only and BM-only cost-weighted hot zones in (a) 2002-2004 and (b) 2005-2007 164 Finally, the stability of the performance of the Base-Model and Full-Model approaches is examined by comparing the hot zones of two periods. As shown in Table 5.20, most CVs with Full-Model hot zones were smaller than those with Base-Model hot zones except the indicator of number of hot zones. To further investigate the performance, the hot zones of two periods are overlaid by GIS and the results are shown in Table 5.21. It is found that 68.75% of 2002-2004 and 66% of 2005-2007 hot zones could be compatible with the Base-Model approach and the shares reached 69.70% and 76.67% with the Full-Model approach. Considering the CVs and the shares of HZ24-cum-HZ57, the performance of the Full-Model approach can be regarded more stable than that of the Base-Model approach. Table 5.20 Variation of BM and FM hot zones between two periods Mean Base Model Standard deviation CV Mean Full Model Standard deviation CV No. of Hot Zones 49.00 1.41 0.03 63.00 4.24 0.07 No. of BSUs 213.50 10.61 0.05 248.00 2.83 0.01 Costs 4728.50 478.71 0.10 4853.50 34.65 0.01 19.99 1.13 0.06 23.68 0.23 0.01 Length of Hot Zones 165 Table 5.21 Numbers and percentages of BM and FM hot zones identified in both periods Base Model Full Model 33 68.75% 46 66% 69.70% 76.67% Notes: 1. Percentages of HZ24 in the HZ24 -cum- HZ57 category are typed in italics 2. Percentages of HZ57 in the HZ24 -cum- HZ57 category are underlined 5.4.3 Empirical Bayes The results of EB-based identification of pedestrian hazardous road locations are shown and discussed in this subsection. In particular, the results are compared with those with crash intensity defined by the observed counts (OC). 5.4.3.1 Simple ranking Table 5.22 summarizes the results on pedestrian hot zones with crash intensity defined by the EB estimate and the observed count of pedestrian casualties when the simple ranking method is used to define the threshold value. It can be observed that in both periods, the EB-based approach identified slightly smaller number and shorter length of hot zones with less BUSs and pedestrian casualties than the OC-based approach did, when the threshold value was determined by 90% and 95% percentiles of crash intensity. From Table 5.23 which shows the number and percentage of hot zones identified by one approach only, one may observe that the number of EB-only pedestrian hot zones was zero when 90% and 95% percentiles were used to 166 define the threshold values, suggesting that all the pedestrian hot zones identified by the EB-based approach could also be identified by the OC-based approach. However, when the 99% percentile, in other words, a very high value was assigned as the threshold value, the situation became complicated. As shown in Table 5.22, the EB-based approach detected longer pedestrian hazardous road locations with more BSUs and pedestrian casualties in the period of 2002-2004, but in the period from 2005 to 2007, the OC-based approach detected more. In addition, not all of hot zones were compatible in 2002-2004. There was one hot zone detected by the EB-based approach only (See Table 4.23). Table 5.22 Statistics on link-attribute pedestrian hot zones with crash intensity defined by the EB estimate and observed counts (simple ranking) EB estimate (EB) 2002-2004 HZ type 2005-2007 90% 95% 99% 90% 95% 99% 31 14 2 33 14 5 12.47 5.19 0.60 12.55 4.78 1.28 Number of BSUs 126 53 6 127 49 13 Number of pedestrian casualties 505 305 48 466 274 108 Number of HZs Length of HZs (km) Observed count (OC) 2002-2004 HZ type 2005-2007 90% 95% 99% 90% 95% 99% 34 18 2 33 19 7 14.15 6.64 0.40 13.87 6.63 1.97 Number of BSUs 145 68 4 142 68 20 Number of pedestrian casualties 564 367 30 505 327 146 Number of HZs Length of HZs (km) 167 Table 5.23 Link-attribute hot zones identified only by the EB or the OC approach (simple ranking) 2002-2004 90% EB-only 0 (0%) 95% OC-only 4 (11.76%) EB-only OC-only 0 5 (0%) (27.78%) 2005-2007 90% EB-only 0 (0%) 99% EB-only 1 (50%) 95% OC-only 1 (3.03%) EB-only 0 (0%) OC-only 1 (50%) 99% OC-only 5 (26.32%) EB-only 0 (0%) OC-only 2 (28.57%) In order to investigate the stability of the performance of the two approaches, the hot zones of two periods are compared. Table 5.24 shows the variation of hot zones in terms of the number and length of hot zones, the number of BSUs and pedestrian casualty counts. When 90% and 95% percentiles were used to define the threshold values, it is difficult to tell which approach is more stable. Using 90% as an example, there was greater variability between EB-based hazardous road locations in terms of the number of hot zones, as demonstrated by the CV values (0.04 in EB-based hot zones and 0.02 in OC-based hot zones). However, if the length of hot zones is considered, the EB-based approach was found more stable with CV value equal to 0.00. If one targets hot zones with the threshold value defined by the 99% percentile, the performance of the EB-based approach was significantly stable than that of the OC-based approach. In this sense, the EB-based approach is desirable if one is interested in hot zones with extremely “hazardous” road 168 locations. Table 5.24 Variation of EB and OC hot zones between two periods (simple ranking) 90% Mean EB estimate Standard deviation CV Observed count Mean Standard deviation CV No. of Hot Zones 32.00 1.41 0.04 33.50 0.71 0.02 No. of BSUs 126.50 0.71 0.01 143.50 2.12 0.01 Crash Intensity 485.50 27.58 0.06 534.50 41.72 0.08 Length of Hot Zones 12.51 0.06 0.00 14.01 0.20 0.01 95% Mean No. of Hot Zones EB estimate Standard deviation CV Mean Observed count Standard deviation CV 14 0 0.00 18.50 0.71 0.04 No. of BSUs 51.00 2.83 0.06 68.00 0.00 0.00 Crash Intensity 289.50 21.92 0.08 347.00 28.28 0.08 4.99 0.29 0.06 6.64 0.01 0.00 Length of Hot Zones 99% Mean EB estimate Standard deviation CV Mean Observed count Standard deviation CV No. of Hot Zones 3.50 0.71 0.20 4.50 3.54 0.79 No. of BSUs 9.50 2.83 0.30 12.00 11.31 0.94 Crash Intensity 78.00 28.99 0.37 88.00 82.02 0.93 Length of Hot Zones 0.94 0.29 0.31 1.19 1.11 0.94 5.4.3.2 Safety potential Table 5.25 summarizes the results on pedestrian hot zones with crash intensity defined by the EB estimate and the observed count of pedestrian casualties when the “potential crash reduction” method is used to define the threshold value. It can be observed that the EB-based and OC-based 169 approaches could identify the same number and length of hot zones, the same number of BSUs and the same number of pedestrian casualties as well. Moreover, they were 100% compatible, as demonstrated in Table 5.26. Table 5.25 Statistics on link-attribute pedestrian hot zones with threshold value defined by EB estimate and observed pedestrian casualty count (safety potential) 2002-2004 2005-2007 HZ type EB OC EB OC Number of HZs 44 44 45 45 14.18 14.18 15.07 15.07 Number of BSUs 150 150 159 159 Number of pedestrian casualties 535 535 503 503 Length of HZs (km) Table 5.26 Link-attribute hot zones identified only by the EB or the OC approach (safety potential) 2002-2004 EB-only 0 (0%) 2005-2007 OC-only 0 (0%) EB-only 0 (0%) OC-only 0 ( 0%) 5.5 Summary This chapter employed casualty-weighted methods, including unweighted and cost-weighted approaches to identify hot zones in Kwun Tong District. In particular, the threshold values were defined by the expected crash intensity which was calculated by negative binomial regression models. When the simple ranking method was used to determine the threshold value, the performance of the unweighted approach was more stable than that of the cost-weighted approach. Nevertheless, the choice of the approach 170 still depends on the targeted injury type of pedestrian casualties. When the threshold value was determined by incorporating surrounding environmental variables of pedestrian casualties, the hot zones detected by the unweighted approach could also be identified by the cost-weighted approach. The performance of the Full-Model approach is more stable. Nonetheless, the Base-Model hot zones are also important in addressing pedestrian safety issues such as the allocation of medical resources. In general, the hot zones identified by the EB-based approach and the OC-based approach are similar, but the EB-based approach is found superior to the OC-based approach in the stability of the performance when the threshold value is defined with a relatively high value. 171 CHAPTER 6 EVENT-BASED ANALYSIS FOR ROAD CRASHES This chapter performs event-based analysis for road crashes. The hot zones are identified with threshold value determined by numerical definition and a Monte Carlo method. In particular, the spatial pattern of road crashes is also analyzed by taking into consideration the impacts of traffic volume on the spatial distribution of road traffic crashes. The data sources and analytical procedures are presented in details in Section 6.1.1 and 6.1.2 respectively. After discussing the event-based results in Section 6.1.3, a comparison with link-attribute analysis is made in Section 6.2. 6.1 Territory-Wide Identification of Hazardous Road Locations 6.1.1Data Description The ATC road networks (1,060 km) of 2002-2004 and 2005-2007 are used for territory–wide analysis of road crashes in this chapter. There were 31,324 and 30,511 road crashes happening on ATC roads during the period from 2002 to 2004 and the period from 2005 to 2007 respectively. The interval of the reference points is defined as 100 meters. With the dissolved ATC road 173 network, 11,628 reference points were obtained. 6.1.2Data Analysis Numerical definition and Monte Carlo method will be used in the identification of hazardous road locations of the whole territory. In particular, spatial patterns of road crashes are also analyzed by taking into consideration the traffic volume. 6.1.2.1 Numerical definition Firstly, arbitrary numbers are used in this chapter to analyze spatial distribution of road crashes. Consistent with the link-attribute analysis, the threshold values are assigned as 9, 12 and 15 traffic crashes in a three-year period. The cut-off values for a hot zone are then expected as 18, 24 and 30 road crashes in three years. 6.1.2.2 Monte Carlo simulation Crash frequency Monte Carlo simulation method is used to determine the threshold values for crash frequency. Assuming there are m (m=31,324 in 2002-2004 and =30,511 in 2005-2007) road crashes during a time interval t (t=3 years), the general steps for the identification of hot zones include: 174 (1) For the crash pattern, calculate ECI (Equation 3.7 and 3.8) at each RP as actual crash intensity; (2) Randomly select m out of 11,628 reference points as one simulated road crash pattern and calculate the crash intensity, denoted as ECI(sim), at each RP according to Equations 3.7 and 3.8; (3) Repeat Step 2 1000 times; (4)For each reference point, the 50th (95% significance level) largest ECI(sim) is used as the threshold value. (5) Identify hot zones by following Equation 3.9 -3.11. To investigate the sensitivity of hot zones to the threshold value, the significance level in Step 4 is also defined as 99% and 99.9%. Crash risk- incorporation of traffic volume Taking into consideration traffic volume (AADT), the threshold values can be defined by modifying the Monte Carlo simulation. The probability to allocate a road crash to a RP is determined by the AADT. Assuming there are m road crashes during a period of time, the general steps include: (1) For the crash pattern, calculate ECI (Equation 3.7 and 3.8) at each RP as actual crash intensity; (2) Collect traffic exposure (AADT) at the location of each RP. 175 (3) With the RPs, simulate a pattern of m road crashes. The probability to select a RP as a road crash is determined by the AADT at that RP. (4) Calculate the crash intensity ECI(sim) at each RP according to Equation 3.7 and 3.8; (5) Repeat Step 3 and 4 1,000 times. (6)For each reference point, the 50 th (95% significance level) largest ECI(sim) is used as the threshold value. (7) Identify hot zones by following Equations 3.9 -3.11. To investigate the sensitivity of hot zones to the threshold value, the significance level in Step 6 is also defined as 99% and 99.9%. Assuming that there are n reference points, in order to implement Step 2, the reference points are first labeled as 1, 2, 3 …n. Every reference point has AADT information, as shown in the example of Table 5.2. Then, a variable “Interval” is created based upon the AADT information in an “accumulation” manner (see Table 6.2). The lower bound (“Lower”) and the upper bound (“Upper”) of the interval for reference point i can be calculated as: 176 1 if i 1 Loweri i1 1 AADTj if i 1 j 1 (6.1) i Upperi AADTj (6.2) j 1 where i, j=1,2,3…n and AADTj is the AADT at the location of reference point j. Table 6.1 Illustration of reference points and AADT information Reference Point No. AADT 1 800 2 1230 3 12567 4 3245 5 6558 … … n-1 12432 n 790 Table 6.2 Illustration of “Interval” variable Reference Point No. AADT Interval [Lower, Upper] 1 800 [1,800] 2 1230 [801, 2030] 3 12567 [2031, 14597] 4 3245 [14598, 17842] 5 6558 [17843, 24401] … … … n-1 12432 [234532, 246963] n 790 [246964, 247753] To assign a road crash to a reference point, a random value is generated between 1 and the upper bound of the interval of reference point n (247,753 in 177 Table 6.2). The reference point with an interval in which the random number falls will be selected as the simulated road crash. For instance, if the random value is generated as 8,000 by the computer, a road crash will be created at the location of Reference point 3. In this way, a simulated road crash is more likely to be allocated to a reference point with greater AADT. 6.1.3 Results 6.1.3.1 Numerical definition Table 6.3 shows the statistics on event-based hot zones with threshold values determined by an arbitrary number. When the threshold values are defined as 9, 12, 15 road crashes in three years, the numbers of hot zones identified by the event-based approach were 324, 243 and 174 in the period of 2002-2004, and 292, 198 and 138 in the period from 2005 to 2007, respectively. If individual hot zones are examined, the length of hot zones greatly varied especially when the threshold value is small. For instance, the minimum and maximum lengths of HZ 18+ were 0.12 km and 6.77 km with CV equal to 1.39 in 2002-2004. If one investigates the road crashes happening on each hot zone, one may detect some hot zones with small numbers of road crashes which were less than the defined critical values. As shown in Table 6.3, the minimum numbers of road crashes occurring on hot zones of HZ18+, HZ24+ and were 9, 12, 15, only half of the critical values. As one of the important characteristics of 178 the arbitrary-number definition is that it enables users easily control the critical value of a hot zone, the hot zones with crash intensity less than the critical value are undesirable. Table 6.4 summarizes the characteristics of these hot zones. The “undesirable” hot zones accounted for a large share and the percentages significantly rose with increased threshold values. Nearly 30% of HZ18+ hot zones had road crashes less than 18 in either the 2002-2004 or the 2005-2007 period, and 57.47% and 47.83% of HZ30+ hot zones had the number of road crashes smaller than 30 in the periods from 2002 to 2004 and from 2005 to 2007 respectively. The length of these hot zones was short with mean value no more than 200 meters. The maximum length of hot zones was only 370 meters. Looking into the locations of these hot zones on the map, one may observe that these tiny hot zones were all located around road junctions, as illustrated in Figure 6.1 which delineates the locations of undesirable HZ18+ hot zones in part of Hong Kong in the period from 2002 to 2004. Since the arbitrary-number approach identifies a large number of undesirable hot zones, it may be not very appropriate to apply this method to the identification of hazardous road locations. 179 Table 6.3 Event-based hot zones with threshold values defined by an arbitrary number 2002-2004 HZ type 2005-2007 HZ18+ HZ24+ HZ30+ HZ18+ HZ24+ HZ30+ Number of HZs 324 243 174 292 198 138 Number of BSUs 1136 723 473 1060 643 418 107.33 69.37 45.74 100.68 61.94 40.12 Minimum 0.12 0.12 0.12 0.11 0.13 0.13 Maximum 6.77 1.82 1.71 5.20 3.99 1.88 Mean 0.33 0.29 0.26 0.34 0.31 0.29 Standard deviation 0.46 0.22 0.20 0.46 0.35 0.24 1.39 0.76 0.77 1.35 1.13 0.83 11845 8680 6452 11131 7983 5842 Minimum 9 12 15 9 12 15 Maximum Length (km) Total CV Road Crashes Total 999 327 309 661 505 321 Mean 36.56 35.72 37.08 38.12 40.32 42.33 Standard deviation 62.63 35.59 35.83 60.20 50.46 40.70 CV 1.71 1.00 0.97 1.58 1.25 0.96 Table 6.4 Statistics on event-based hot zones with crash intensity less than the critical value 2002-2004 HZ type 2005-2007 HZ18+ HZ24+ HZ30+ HZ18+ HZ24+ HZ30+ 94 (29.01%) 95 (39.09%) 100 (57.47%) 86 (29.45%) 79 (39.89%) 66 (47.83%) 192 200 209 180 164 137 Total 17.53 18.41 19.25 16.77 15.36 12.89 Minimum 0.14 0.13 0.12 0.11 0.14 0.14 Maximum 0.28 0.37 0.37 0.37 0.34 0.33 Mean 0.19 0.19 0.19 0.19 0.19 0.20 Standard deviation 0.02 0.03 0.03 0.04 0.04 0.03 CV 0.11 0.16 0.16 0.21 0.21 0.15 Number of HZs Number of BSUs Length 180 Figure 6.1 Illustration of locations of undesirable hot zones 6.1.3.2 Monte Carlo simulation Using 95%, 99% and 99.9% percentiles of the simulated patterns of road crashes as threshold values, the hot zones are identified and the results are summarized in Table 6.5. The numbers of hot zones with crash intensity determined by crash frequency (CF) are significantly different with that by crash risk (CR). Given the same significance level, the crash-risk approach can detect more hazardous road locations. Using 99% significance level as an example, there were 296 and 261 CF-based hot zones in the periods of 2002-2004 and 2005-2007, whereas the numbers of CR-based hot zones 181 reached 423 and 397 respectively. Focusing on characteristics of individual hot zones, one may observe that there was much variation among CR-based hot zones in terms of number of road crashes and length of hot zones. When the 95% significance level is used to define the threshold value, the number of road crashes in a CR-based hot zone ranged from 1 to 1,270 with CV equal to 2.49 in the period of 2002-2004 and the minimum and maximum length of a hot zone were 0.12 km and 10.51 km respectively. In addition, it should be noticed that some of CR-based hot zones might have very limited number of road crashes happening on them. The minimum number of road crashes in a hot zone could be 1 (95% CR-based hot zones in 2002-2004) or 2 (99.9% CR-based hot zones in 2002-2004 and 95% CR-based hot zones in 2005-2007). Looking into these locations, it is found that the traffic volume was very small. Even a slight change of traffic volume can cause great variation. These locations are more likely to be affected by the regression-to-the-mean problem. Therefore, one should be very careful in treating these locations as road hazards. 182 Table 6.5 Statistics on hot zones based on statistical definition 2002-2004 Hot Zone Type CF CR CF CR 99.9% CF CR Number of Hot Zones 363 492 296 423 253 367 324 487 261 397 207 321 Number of Hot RPs Road Crashes on Hot Zones Total Minimum Maximum Mean 1381 2045 998 1632 774 1327 1270 2110 915 1586 689 1234 13,967 7 1166 38.40 16,024 1 1270 32.56 11,411 9 1015 38.55 13,887 4 1,120 32.82 9,554 9 955 37.76 12,338 2 593 33.60 12,987 7 917 40.08 15,645 2 1091 32.12 10,516 8 893 40.29 13,230 3 803 33.32 8,786 9 641 42.44 11,370 4 775 35.42 70.91 81.14 63.66 75.06 62.53 48.92 72.32 78.19 73.54 64.82 61.87 63.01 1.85 2.49 1.65 2.29 1.66 1.46 1.80 2.43 1.83 1.95 1.46 1.78 131.78 0.12 190.50 0.12 96.51 0.12 153.13 0.12 75.21 0.12 125.27 0.12 121.70 0.10 196.68 0.10 88.51 0.12 149.07 0.10 67.35 0.13 116.79 0.10 Maximum Mean Standard deviation 8.40 0.36 0.54 10.51 0.39 0.80 6.64 0.33 0.43 8.62 0.36 0.69 6.08 0.30 0.40 4.16 0.34 0.42 6.78 0.38 0.57 11.82 0.40 0.78 6.33 0.34 0.55 6.41 0.37 0.57 4.45 0.33 0.43 6.07 0.36 0.51 CV 1.50 2.05 1.30 1.92 1.33 1.24 1.50 1.95 1.62 1.54 1.30 1.42 Standard deviation CV Length (km) Total Minimum 95% 2005-2007 99% 183 95% 99% CF CR CF CR 99.9% CF CR Figure 6.2 delineates the spatial distribution of CF-based and CR-based hot zones using 99% percentile as the threshold value by time period. In both 2002-2004 and 2005-2007 periods, most hot zones were located in Hong Kong Island and Kowloon Peninsula regardless of whether CF or CR approach was used. However, there was some difference between CF-based and CR-based hot zones. A typical example could be the Lantau Island (I) which is a remote and sparsely populated place located in the southwest of Hong Kong. No hot zones were identified on the island in either period when crash frequency was used as the measurement of crash intensity. However, 3 hot zones were detected in 2002-2004 and 2005-2007 respectively when traffic volume was taken into consideration. To further investigate the similarity and difference, the two types of hot zones are overlaid and the hot zones identified by both approaches and by one approach only are summarized in Table 6.6 by district. The table records the total number and shares of the three types of hot zones. Summing up the numbers of CF-cum-CR, CF-only and CR-only hot zones, there were altogether 451 and 426 hot zones in the whole territory. More than half of hot zones were located in the urban core area, of which about 55% and 57.33% were identified by both CF-based and CR-based methods in the periods of 2002-2004 and 2005-2007 respectively. The shares of hot zones identified only by the CF-based approach were small. Less than 10% of hot zones belonged to the CF-only type in urban core area in both periods. The 184 figures were even smaller with hot zones in suburb. Only 4.19% and 7.46% of hot zones were identified by the CF-based approach only. In another words, most of hot zones identified by the CF-based approach could also be identified by the CR-based approach, especially for hot zones located in suburb. As running Monte Carlo simulations requires much computing time, one may only employ CR-based approach to identify hot zones even if one would also like to identify hot zones with large number of road crashes. If individual district is investigated, it is found that most of hot zones identified by the CF-based approach and hot zones identified by the CR-based approach could be compatible in the district of Yau Tsim Mong (YTM) in both periods. The shares of CF-cum-CR hot zones accounted for 86.5% and 83.9% of total hot zones. Hence, one may choose either the CF-based or the CR-based approach to identify hazardous road locations in YTM. In addition, five out of eighteen districts (Tuen Mun, Yuen Long, Sai Kung, Northern and Islands) had no hot zones identified only by the CF-based approach. For these districts, one may identify hot zones only with the CR-based approach and then investigate the crash frequency of hot zones if one is interested in both crash frequency and crash risk. 185 (a) (b) Figure 6.2 Hot zones identified by 99% significance level for (a) crash frequency and (b) crash risk in the period from 2002 to 2004 186 (a) (b) Figure 6.3 Hot zones identified by 99% significance level for (a) crash frequency and (b) crash risk in the period from 2005 to 2007 187 Table 6.6 Hot zones by district and type based on statistical definition 2002-2004 District Urban Core Central & Western (CW) Wan Chai(WCH) Eastern (E) Yau Tsim Mong (YTM) Sham Shui Po (SSPO) Kowloon City (KC) Wong Tai Sin (WTS) Kwun Tong (KT) Kwai Tsing (KTS) Suburb Southern (S) Sha Tin (ST) Tai Po (TP) Tsuen Wan (TW) Tuen Mun (TM) Yuen Long (YL) Sai Kung (SK) Northern (N) Islands (I) Total Region Hong Kong Island Hong Kong Island Hong Kong Island Kowloon Peninsular Kowloon Peninsular Kowloon Peninsular Kowloon Peninsular Kowloon Peninsular Kowloon Peninsular Hong Kong Island The New Territories The New Territories The New Territories The New Territories The New Territories The New Territories The New Territories The New Territories 2005-2007 CF-cum-CR (%) CF_only (%) CR_only (%) Total (No.) CF-cum-CR (%) CF_only (%) CR_only (%) Total (No.) 55.02 45.45 35.29 35.48 86.49 65.22 73.81 48.28 60.00 37.50 49.30 12.50 63.89 68.18 32.00 50.00 51.61 52.38 61.90 0.00 9.64 9.09 17.65 19.35 2.70 4.35 7.14 6.90 11.43 9.38 4.19 8.33 2.78 35.34 45.45 47.06 45.16 10.81 30.43 19.05 44.83 28.57 53.13 46.51 79.17 33.33 9.09 60.00 50.00 48.39 47.62 38.10 100.00 249 22 17 31 37 23 42 29 35 32 215 24 36 22 25 34 31 21 21 3 451 57.33 54.55 50.00 61.54 83.87 80.00 70.45 35.71 37.50 36.36 34.83 15.79 60.61 32.26 18.18 28.57 33.33 45.00 40.00 0.00 9.05 4.55 9.09 11.54 6.45 4.00 11.36 14.29 15.63 9.09 7.46 5.26 12.12 22.58 13.64 0.00 0.00 0.00 0.00 0.00 33.62 40.91 40.91 26.92 9.68 16.00 18.18 50.00 46.88 54.55 57.71 78.95 27.27 45.16 68.18 71.43 66.67 55.00 60.00 100.00 232 22 22 26 31 25 44 28 32 22 201 19 33 31 22 28 27 20 20 3 426 22.73 8.00 0.00 0.00 0.00 0.00 0.00 188 To test the stability of the two approaches, hot zones detected in the period of 2002-2004 are compared with hot zones identified in the period of 2005-2007 in terms of the number of hot zones, number of RPs, the number of road crashes and the total length of hot zones. The results in Table 6.7 show that the variation of the 2002-2004 and 2005-2007 hot zones increased with increasing significance levels regardless of whether CF or CR approach was used, indicating that the performance of the two approaches became less stable with increasing threshold values. If the two approaches are compared, the CVs among CR-based hot zones were smaller than those among CF-based hot zones, suggesting that the CR-based approach is more stable than the CF-based approach. 189 Table 6.7 Variation of hot zones (CF and CR) between two periods 95% Mean CF Standard deviation CV No. of Hot Zones 343.50 27.58 0.08 No. of RPs 1325.50 78.49 No. of Crashes 13477.00 126.74 Length of Hot Zones CR Standard deviation CV 489.50 3.54 0.01 0.06 2077.50 45.96 0.02 692.96 0.05 15834.50 267.99 0.02 7.13 0.06 193.59 4.37 0.02 Mean 99% No. of Hot Zones 278.50 CF Standard deviation 24.75 No. of RPs 956.50 58.69 0.06 1609.00 32.53 0.02 10963.50 632.86 0.06 13558.50 464.57 0.03 92.51 5.66 0.06 151.10 2.87 0.02 Mean No. of Crashes Length of Hot Zones 410.00 CR Standard deviation 18.38 0.04 CV Mean 0.09 CV 99.9% No. of Hot Zones 230.00 CF Standard deviation 32.53 No. of RPs 731.50 60.10 0.08 1280.50 65.76 0.05 No. of Crashes 9170.00 543.06 0.06 11854.00 684.48 0.06 71.28 5.56 0.08 121.03 6.00 0.05 Mean Length of Hot Zones 344.00 CR Standard deviation 32.53 0.09 CV Mean 0.14 CV 6.2 Comparison with Link-attribute Approach The empirical link-attribute and event-based hot zones for road crashes are compared in this section by the way in which the threshold values are defined, including the arbitrary-number, Monte-Carlo-simulation and incorporation-of-road-environment definitions. 190 6.2.1 Numerical Definition-An Arbitrary Number As discussed in previous section, the event-based approach detected a large number of undesirable hot zones with crash intensity (the number of road crashes) less than the critical value. In order to identify more characteristics of the two arbitrary-number approaches, the following analysis will exclude those undesirable hot zones and only compare hot zones with crash intensity equal to or more than the critical value. Table 6.8 summarizes the characteristics of the link-attribute and event-based arbitrary-number hot zones in the period from 2002 to 2004. Even though the undesirable hot zones are excluded, the number of hot zones identified by the event-based approach was still much higher than that by the link-attribute approach. Taking HZ 18+ as an example, the link-attribute approach identified 147 hot zones and event-based approach detected 83 more hot zones. The length of event-based hot zones was even more than doubled that of link-attribute hot zones. Focusing on the computing time, one may observe that the event-based approach was time-consuming. It took the event-based approach 55 hours to identify the crash hot zones, which was ten hours more than the link-attribute approach. 191 Table 6.8 Characteristics of link-attribute and event-based arbitrary-number hot zones (2002-2004) HZ18+ HZ30+ HZ24+ L E L E L E Number of Hot Zones 147 230 73 148 Number of BSUs/RPs Crash 423 6,482 944 10,600 191 3,579 523 7,044 36 92 Length (km) 41.98 89.80 19.02 50.96 74 264 4,299 26.49 45 55 45 55 Hot Zone Type Computing time (h) 2,084 9.15 45 55 When the link-attribute and event-based hot zones are overlaid, the hot zones can be classified into hot zones identified by both approaches (L-cum-E), hot zones identified by the link-attribute only (L-only) and hot zones detected by the event-based only (E-only). Table 6.9 summarizes comparison results on HZ18+ hot zones in the period from 2002 to 2004. The numbers of L-only and E-only hot zones were 16 and 113 respectively. If one closely looks into hot zones on the map, one may observe that most of E-only hot zones were located around road junctions (see Figure 6.4 as an example). In addition, the crash intensity per hot zone was similar with the two approaches, but the crash intensity per km was greater with L-only hot zones. The minimum length of L-only hot zones was 0.19 m but it was only 0.12 m with E-only hot zones, indicating that the event-based approach may identify more short hot zones with smaller crash intensity per km. 192 Table 6.9 Comparison of link-attribute and event-based hot zones (HZ18+) Hot Zone Type L-cum-E L_only E_only 103 16 113 7899 457 3203 Minimum 19 18 18 Maximum 1219 54 70 Mean 76.68 28.56 28.35 Standard deviation 125.07 9.27 10.90 1.63 0.32 0.38 Total 55.37 3.37 29.07 Minimum 0.14 0.19 0.12 Maximum 6.77 0.30 0.94 Mean 0.54 0.21 0.26 Standard deviation 0.76 0.03 0.12 CV 1.41 0.14 0.46 142.7 135.6 110.2 Number of Hot Zones Crash Total CV Length (km) Number of road crashes per km Figure 6.4 E-only hot zones 193 Table 6.10 shows the statistics of link-attribute and event-based hot zones for both periods. The performance of the link-attribute method is more stable when the threshold value is equal to 9 and 12 road crashes in three years, but less stable when the threshold value is defined as 15 road crashes, indicating that the performance of the two approaches is closely related with the threshold value. Table 6.10 Statistics of link-attribute and event-based hot zones (HZ18+) for two periods HZ18+ Mean L Standard deviation CV Mean E Standard deviation CV No. of Hot Zones 147.50 0.71 0.00 218.00 16.97 0.08 No. of BSUs/RPs 420.00 4.24 0.01 912.00 45.25 0.05 No. of Crashes 6373.50 153.44 0.02 10320.00 395.98 0.04 41.55 0.61 0.01 86.86 4.16 0.05 Length of Hot Zones HZ24+ No. of Hot Zones 74.50 L Standard deviation 2.12 No. of BSUs/RPs 184.00 9.90 0.05 501.00 31.11 0.06 No. of Crashes 3469.00 155.56 0.04 6843.00 284.26 0.04 18.32 1.00 0.05 48.77 3.10 0.06 Mean Length of Hot Zones 133.50 E Standard deviation 20.51 0.15 CV Mean 0.03 CV HZ30+ Mean L Standard deviation CV Mean E Standard deviation CV No. of Hot Zones 37.00 1.41 0.04 73.00 1.41 0.02 No. of BSUs/RPs 90.00 2.83 0.03 272.50 12.02 0.04 2032.00 73.54 0.04 4374.50 106.77 0.02 8.96 0.28 0.03 26.86 0.52 0.02 No. of Crashes Length of Hot Zones 194 In summary, the stability of the two approaches depends on the threshold value. As the event-based approach can identify more hot zones no matter which threshold value is used, one may employ the event-based approach to obtain hazardous road locations in order to avoid “false negative” locations. However, one should be very careful in dealing with those hot zones with crash intensity less than the critical value. 6.2.2 Monte Carlo Simulation on Crash Frequency Similar with the numerical-definition results, the number of hot zones identified by the event-based approach was also significantly higher than that by the link-attribute approach when Monte Carlo method was used to define the threshold value. As shown in Table 6.11, the average number of 95% hot zones (HZ95%) identified by the event-based approach was 343.5 with 126.7 km long, and the number of hot zones detected by the link-attribute approach was only 245.5 with 79.7 km in total length. However, if one examines the variability of the hot zones, one may observe that the values of CV with link-attribute hot zones were smaller than those with the event-based hot zones regardless of the significance level. It suggests that the performance of the link-attribute approach is more stable than that of the event-based approach. In this sense, the link-attribute approach is preferable if one would like to identify hot zones by using the Monte Carlo approach to define the 195 threshold value. Recall that the results on hypothetical road network demonstrate that the link-attribute Monte Carlo simulation is more stable on grid network. As a great number of road crashes were located in urban area with roads resembling the grid structure, the findings on empirical road network is consistent with hypothetical road network. Table 6.11 Statistics of link-attribute and event-based hot zones for two periods by significance level HZ95% Mean L Standard deviation CV Mean E Standard deviation CV No. of Hot Zones 245.50 2.12 0.01 343.50 27.58 0.08 No. of BSUs/RPs 810.50 12.02 0.01 1325.50 78.49 0.06 10120.00 56.57 0.01 13477.00 692.96 0.05 79.65 0.86 0.01 126.74 7.13 0.06 CV No. of Crashes Length of Hot Zones HZ99% Mean L Standard deviation CV Mean E Standard deviation No. of Hot Zones 184.50 0.71 0.00 278.50 24.75 0.09 No. of BSUs/RPs 564.00 1.41 0.00 956.50 58.69 0.06 No. of Crashes 7905.50 133.64 0.02 10963.50 632.86 0.06 55.62 0.45 0.01 92.51 5.66 0.06 Length of Hot Zones HZ99.9% Mean L Standard deviation CV Mean E Standard deviation CV No. of Hot Zones 124.00 1.41 0.01 230.00 32.53 0.14 No. of BSUs/RPs 359.00 5.66 0.02 731.50 60.10 0.08 No. of Crashes 5694.00 11.31 0.00 9170.00 543.06 0.06 35.55 0.43 0.01 71.28 5.56 0.08 Length of Hot Zones 196 6.2.3 Incorporation of Road Environmental Variables As presented in Chapter 4, the link-attribute method introduces road environmental variables by using a “potential crash reduction” approach to define the threshold value. The event-based method, however, incorporates environmental factors by modifying the Monte Carlo procedure, as outlined in previous section. This sub-section firstly compares the results of link-attribute and event-based approaches for the identification of hot zones by taking into consideration the traffic exposure and then discusses the two methods in identifying hot zones by incorporating more environmental variables. 6.2.3.1 Incorporation of traffic exposure As the results in Chapter 3 have shown that the performance of Model LA is more stable than Model LA’, the link-attribute hot zones in the comparison are identified with Model LA rather than Model LA’. Table 6.12 summarizes hot zones identified by the two approaches in the period from 2002 to 2004. The link-attribute approach identified 483 hot zones and the event-based approach detected 492, 423 and 367 hot zones with 95%, 99% and 99.9% significance levels respectively. The following analysis will compare the link-attribute hot zones with 95% event-based hot zones due to similar number and length of hot zones. Firstly, the two approaches are compared in terms of computing time. Table 6.12 records the computing time for 197 generating hot zones. The computing time of link-attribute approach for the identification of hot zones includes computing time for the procedures of geo-validation of road crashes, calculation of crash intensity, running the negative binomial model to calculate the threshold value and modeling of the spatial pattern of BSUs. The computing time for the event-based approach equals to the sum of time spent on geo-validation of road crashes, running 1,000 replications to calculate the threshold values, and modeling of spatial patterns. It can be observed that the computing time of event-based approach was nearly doubled that of the link-attribute approach. Hence, one may choose the link-attribute approach to identify hot zones in order to save time. In addition, the relationship between the traffic exposure and the occurrence of the road crashes may not be linear. With statistical models such as the negative binomial model, the relationship can be better examined. The link-attribute approach is hence superior to the event-based approach. Table 6.12 Statistics of link-attribute and event-based hot zones (incorporation of traffic exposure) E Hot Zone Type L 95% 99% 99.9% Number of Hot Zones 483 492 423 367 Number of BSUs/RPs 1,859 2,045 1632 1327 Crash 17,027 16,024 13,887 12,338 Length (km) Computing time (h) 179.05 50 190.5 90 153.13 90 125.27 90 198 Next, the two approaches are compared in terms of stability. From Table 6.13, one may observe that the performance of the event-based approach is more stable than the link-attribute approach in terms of the number of BSUs/RPs and length of hot zones, but is similar with the link-attribute approach when focusing on the numbers of hot zones and crashes. Table 6.13 Statistics of link-attribute and event-based hot zones for two periods L E95% Mean Standard CV Mean Standard deviation deviation CV No. of Hot Zones 480.50 3.54 0.01 489.50 3.54 0.01 No. of BSUs/RPs 1945.50 122.33 0.06 2077.50 45.96 0.02 No. of Crashes 17219.00 271.53 0.02 15834.50 267.99 0.02 187.74 12.29 0.07 193.60 4.38 0.02 Length of Hot Zones 6.2.3.2 Incorporation of other environmental variables The link-attribute method can introduce many independent variables into statistical models so as to incorporate more environmental variables such as road type and road junctions. In this thesis, negative binomial regression is used to introduce these variables (see Chapter 4). However, for the event-based approach, neither negative binomial nor Poisson models are appropriate for incorporating environmental variables because of the double counting of the event-based approach. Even though the search distance of a reference point is defined as half of the interval of reference points, the double counting still exist especially around road junctions. Hence, it is not suitable to use crash intensities of reference points as samples for modeling the spatial 199 distribution of road crashes. Moreover, it is more difficult for the event-based approach to determine environmental attributes for reference points. For instance, reference point A, as shown in Figure 6.5, is located at the intersection of Road No. 1 (a main road) and Road No. 2 (a secondary road). An engineer who aims to take into consideration the contributory factor “road type” may have to assign an attribute value of the road type to the reference point before any further investigation. But which road is the reference point located on? Which could be more appropriate as an attribute value, secondary road or main road? It is quite difficult to tell. Hence, as the link-attribute approach has significant strengths in incorporating environmental variables, one may choose the link-attribute approach to identify hazardous road locations if one would like to incorporate some environmental variables. Figure 6.5 Illustration of determining an attribute value for a reference point 200 6.3 Summary This chapter introduces the event-based method for the identification of hazardous road locations. The hot zones are identified with the threshold values defined by an arbitrary number and Monte Carlo Simulation. When an arbitrary number was used, there were a great number of hot zones with crash intensity less than the critical value. Hence, the arbitrary-number approach is not appropriate for identifying hazardous road locations if one is concerned with the critical value of a hot zone. When the Monte Carlo method was used, the CR-based method was found more stable than the CF-based method. In this light, CR-based method is preferable if one is concerned with stability of the performance of the approach. However, as some sectors such as hospitals are more likely to be interested in the absolute number of road crashes, the CF-based hot zones may also important to them for addressing problems such as allocating medical resources. In addition, the result demonstrates that most CF-based hot zones could also be identified by the CR-based method in some districts. In order to save efforts, one may only use the CR-based approach to identify hazardous road locations even if one is interested in both CF-based and CR-based hot zones. The event-based approach is compared with the link-attribute approach in analyzing road crashes. The event-based approach is less likely to cause false 201 negative problems and it is more convenient for the link-attribute approach to incorporate a set of road environmental variables. 202 CHAPTER 7 EVENT-BASED ANALYSIS FOR ROAD CASUALTIES Crash intensity of a reference point in event-based analysis can be determined not only by crash frequency but also by other methods such as the casualty-weighted method. The first section presents the way in which hot zones are identified in a casualty-weighted and event-based manner. The spatial distribution of crashes involving pedestrian casualties in Kwun Tong District will be analyzed by using simple ranking method to determine the threshold value. After discussing the event-based results in Section 7.1.3, a comparison with link-attribute analysis is made in Section 7.2. 7.1 District-Wide Identification of Hazardous Road Locations for Pedestrians 7.1.1Data Description The spatial distribution of pedestrian casualties occurring on the entire road network (147 km) of Kwun Tong District during the periods of 2002-2004 and 2005-2007 are analyzed. There were altogether 944 pedestrian casualties (22 fatalities, 213 serious injuries and 709 slight injuries) in the period from 2002 to 2004 and 940 pedestrian casualties (21 fatalities, 242 serious injuries 203 and 677 slight injuries) in the period from 2005 to 2007. Details of data description on pedestrian casualties and road network can be found in Chapter 5. The interval of reference points is defined as 100 meters. Using the dissolved road network, 1,890 reference points are finally created. 7.1.2Data Analysis The pedestrian casualties are analyzed by two casualty-weighted methods, namely unweighted and cost-weighted approaches. 7.1.2.1 Unweighted With unweighted denotation, crash intensity ECIUW is calculated by: m ECIUWi fij (7.1) j 1 1 if dij h fij 0 otherwise; (7.2) where m denotes the number of pedestrian casualties; d ij is the network distance between RP i and casualty j; and h is the search distance from RP i. The simple ranking method is used to define the threshold value. In this chapter, 90%, 95% and 99% of largest values of ECIUW are used to determine the threshold values for sensitivity analysis. 204 7.1.2.2 Cost-weighted As discussed in Chapter 5, the values of preventing a serious injury and a slight injury are around 16% and 1% of the value of a fatality respectively. Hence, with cost-weighted denotation, the crash intensity ECICW is calculated by: m ECICWi fij (7.3) j 1 100 if dij h and casualty j is fatal, 16 if dij h and casualty j is seriously injured , fij 1 if dij h and casualty j is slightly injured , 0 otherwise; (7.4) where m denotes the number of pedestrian casualties; d ij is the network distance between RP i and casualty j; and h is the search distance from RP i. Similar with unweighted analysis, 90%, 95% and 99% of largest values of ECICW are used to determine the threshold values. 7.1.3 Results Table 7.1 summarizes the results on event-based pedestrian hot zones. If the two casualty-based approaches are compared, the statistics in the table suggest that the unweighted method could identify hot zones with greater number of all types of casualties, whereas the cost-weighted method identified those locations with high concentration of fatal or serious injured victims. For 205 instance, when 90% percentile was used, the equal-weighing method could detect less hot zones than cost-weighted method in terms of both number and length of hot zones for both periods (31 unweighted hot zones with 16.59 km vs. 38 cost-weighted hot zones with 17.82 km in the period from 2002 to 2004; and 44 unweighted hot zones with 18.4 km vs. 55 cost-weighted hot zones with 20.99 km in the period from 2005-2007). However, the number of all pedestrian casualties or the slightly injured pedestrian casualties happening on unweighted hot zones was larger than that on cost-weighted hot zones (576 pedestrian casualties and 437 slight pedestrian injuries on unweighted hot zones vs. 548 pedestrian casualties and 383 slight pedestrian injuries on cost-weighted hot zones in 2002-2004; and 542 pedestrian casualties and 390 slight pedestrian injuries on unweighted hot zones vs. 537 pedestrian casualties and 342 slight pedestrian injuries on cost-weighted hot zones). The difference is more obvious with increasing threshold values. It demonstrates that the results of pedestrian hot zones are sensitive to the weighting method, in accordance with findings in Chapter 5. 206 Table 7.1 Statistics on event-based pedestrian hot zones with threshold value defined by the simple ranking method 2002-2004 90% Hot Zone Type 99% 95% UW CW UW CW UW CW No. of HZs 31 38 20 24 6 5 No. of RPs 174 177 No. of Casualties No. of Fatalities & Serious Injuries Length (km) 576 548 91 387 83 289 18 114 11 57 139 165 90 101 19 19 16.59 17.82 8.88 8.22 1.77 1.16 2005-2007 90% Hot Zone Type UW 99% 95% CW UW CW 26 UW 5 CW 8 83 260 15 98 16 56 No. of HZs 44 55 No. of RPs 187 220 No. of Casualties No. of Fatalities & Serious Injuries Length (km) 542 537 23 106 377 152 195 99 107 27 25 18.38 20.99 10.49 7.90 1.77 1.63 To further investigate the difference between the two types of hot zones, the unweighted and cost-weighted hot zones are overlaid by GIS and the hot zones are classified into three types, including hot zones identified by both approaches (UW-cum-CW), hot zones identified only by the unweighted approach (UW-only) and hot zones detected only by the cost-weighted approach (CW-only). The type of UW-cum-CW hot zones can be regarded as the most dangerous locations for pedestrians since they were identified as hazards with high numbers of fatalities, serious injuries and slight injuries. Figure 7.1 describes the location of three types of hot zones which were identified by using 99% percentile to determine the threshold value. The hot 207 zones identified by both unweighted and cost-weighted approaches are colored in red. From the map, one can observe that the UW-cum-CW hot zones identified in both periods were located around the Kwun Tong Town Center, which is an activity center of Kwun Tong. Great pedestrian volume and lack of pedestrian protection facilities significantly increase the chance of being involved in a traffic crash. These locations were less likely to be false positive and could be treated with high priority. Apart from the UW-cum-CW hot zones, there were some hot zones identified by only one approach. Table 7.2 shows the statistics on hot zones identified only by one approach in the period from 2002 to 2004. Consistent with previous findings, compared with the CW-only hot zones, the UW-only hot zones were characterized with higher number of pedestrian casualties but lower number of fatal and seriously injured pedestrian casualties. Taking the 90% hot zone type as an example, the average number of pedestrian casualties on UW-only hot zones was 6.33, nearly doubled average pedestrian casualty count of CW-only hot zones (3.22) whereas the mean value of fatalities and serious injuries on CW-only hot zones were nearly three-fold that on UW-only hot zones. The choice of UW-based or CW-based approach depends on the motives of the examination. 208 (a) (b) Figure 7.1 Hot zones of three types using 99% percentile in (a) 2002-2004 and (b) 2005-2007 209 Table 7.2 Event-based hot zones identified only by unweighted or cost-weighted method (2002-2004) 90% UW-only CW-only No. of Hot Zones No. of RPs Casualties Total Minimum Maximum Mean Standard deviation CV Fatalities & Serious Injuries Total Minimum Maximum Mean Standard deviation CV Length (km) Total Minimum Maximum Mean Standard deviation CV 95% UW-only CW-only 3 7 9 25 7 18 10 26 19 3 11 29 2 7 77 5 19 43 2 6 6.33 3.40 0.54 3.22 1.55 0.48 11 4.57 0.42 4.3 1.1 0.26 2 0 2 0.67 0.94 1.40 16 1 5 1.78 1.23 0.69 14 1 5 2 1.31 0.66 22 1 4 2.2 0.88 0.40 0.88 0.27 0.31 0.29 0.02 0.07 2.70 0.18 0.66 0.30 0.14 0.47 1.90 0.20 0.37 0.27 0.06 0.22 2.55 0.16 0.53 0.25 0.13 0.52 99% UW-only CW-only 4 3 10 6 71 22 13 30 17.75 7.08 5 9 7.33 1.70 0.40 0.23 9 0 5 2.25 1.92 12 2 7 4 2.16 0.85 0.54 1.04 0.18 0.41 0.26 0.09 0.61 0.18 0.22 0.20 0.02 0.35 0.10 In order to test the stability of the performance of the two approaches, the hot zones of two periods are compared in terms of the number and length of hot zones, the number of reference points and the crash intensity represented as the number of pedestrian casualties for the unweighted approach and the total costs for the cost-weighted approach. Table 7.3 shows the mean, standard deviation and coefficient of variation of 2002-2004 and 2005-2007 hot zones. When 90% and 99% percentiles were used to define the threshold values, the 210 CVs of unweighted hot zones were less than those of cost-weighted hot zones, but when 95% percentile was used, the CVs of unweighted hot zones were larger except with the indicator of crash intensity. To further examine the variability, the two periods of hot zones are overlaid. Table 7.4 shows the shares of hot zones identified by both periods after the overlay performance. It could be observed that the equal-weighing method is more stable than the cost-weighted method. Around 50% of unweighted hot zones detected in the period of 2002-2004 could also be detected in the period from 2005 to 2007 regardless of the definition of threshold values. Although 59.1% of cost-weighted hot zones identified in the period of 2002-2004 could also be detected in the period of 2005-2007, the shares of hot zones identified by both periods were sharply decreased with increasing threshold values. For example, when the 99% percentile was used to define the threshold value, only 20% of cost-weighted hot zones identified in 2002-2004 could be detected in 2005-2007. 211 Table 7.3 Variation of hot zones between two periods 90% Mean UW Standard deviation Mean CW Standard deviation CV CV No. of Hot Zones 37.50 9.19 No. of RPs 0.25 46.50 12.02 0.26 180.50 9.19 0.05 198.50 30.41 0.15 Crash Intensity 559.00 24.04 0.04 4502.50 666.80 0.15 Length of Hot Zones 17.49 1.27 0.07 19.41 2.24 0.12 95% No. of Hot Zones 21.50 UW Standard deviation 2.12 No. of BSUs/RPs 98.50 10.61 0.11 83.00 0.00 0.00 Crash Intensity 382.00 7.07 0.02 2926.50 399.52 0.14 9.69 1.14 0.12 8.06 0.23 0.03 Mean Length of Hot Zones 25.00 CW Standard deviation 1.41 0.06 CV Mean 0.10 CV 99% Mean UW Standard deviation CV No. of Hot Zones 5.50 0.71 No. of BSUs/RPs 16.50 Crash Intensity Length of Hot Zones Mean CW Standard deviation CV 0.13 6.50 2.12 0.33 2.12 0.13 13.50 3.54 0.26 106.00 11.31 0.11 1016.50 478.71 0.47 1.77 0.00 0.00 1.40 0.33 0.24 Table 7.4 Numbers and percentages of hot zones identified in both periods Unweighted HZ2490%-cum- HZ5790% 18 58.1% 50.0% HZ2495%-cum- HZ5795% 10 50.0% 47.83% Cost-weighted HZ2499%-cum- HZ5799% 3 50.0% 60.0% HZ2490%-cum- HZ5790% HZ2495%-cum- HZ5795% HZ2499%-cum- HZ5799% 26 9 1 59.1% 54.5% 37.5% 50.0% 20.0% 12.5% Notes: 1. Percentages of HZ24 in the HZ24 -cum- HZ57 category are typed in italics 2. Percentages of HZ57 in the HZ24 -cum- HZ57 category are underlined 212 7.2 Comparison with Link-attribute Approach In Chapter 5, hazardous road locations are identified for pedestrians with threshold value defined by the simple ranking method and by a “potential crash reduction” approach for taking into consideration the surrounding environment of pedestrian road crashes. The previous section of this chapter applied the simple ranking method to the event-based identification of hazardous road locations for pedestrians. The following comparative analysis will focus on the hot zones identified with threshold value defined by the simple-ranking method. 7.2.1 Simple Ranking Table 7.5 summarizes hot zones for pedestrians of both periods with threshold value defined by the unweighted and cost-weighted approaches. Consistent with previous findings, the event-based approach detected more hot zones than the link-attribute hot zones regardless of percentiles and casualty-weighted approaches. The difference between the link-attribute and the event-based approaches is more obvious with hot zones identified by the cost-weighted approach. Taking the 95% percentile as an example, the link-attribute approach detected about 33.5 unweighted hot zones (14.01 km) and 32.5 cost-weighted hot zones (10.56 km), and the event-based approach identified 37.5 unweighted hot zones (17.49 km) and 46.5 cost-weighted hot 213 zones (19.41 km). Focusing on cost-weighted hot zones, the variability of link-attribute hot zones were smaller than that of event-based hot zones when 90% and 95% percentiles were used to define the threshold values. For cost-weighted definition, most CV values of link-attribute hot zones were greater than those of event-based hot zones, indicating greater variability with link-attribute hot zones. Comparatively speaking, the performance of the link-attribute hot zones was more stable when the unweighted method was used to define the threshold value, whereas the event-based approach performed stably when the cost-weighted approach was employed. 214 Table 7.5 Statistics of link-attribute and event-based hot zones for two periods (unweighted and cost-weighted) Unweighted Cost-weighted L90% E90% L90% Mean Standard CV Mean Standard CV Mean Standard CV Mean deviation deviation deviation No. of Hot Zones No. of BSUs/RPs No. of Casualties Cost Length of HZs (km) 33.50 143.50 534.50 14.01 Mean No. of Hot Zones No. of BSUs/RPs No. of Casualties Cost Length of HZs Zones (km) 18.50 68.00 347.00 6.64 Mean No. of Hot Zones No. of BSUs/RPs No. of Casualties Cost Length of HZs Zones (km) 4.50 12.00 88.00 1.19 0.71 2.12 41.72 0.20 L95% Standard deviation 0.02 0.01 0.08 0.01 37.50 180.50 559.00 17.49 CV Mean 0.71 0.00 28.28 0.01 L99% Standard deviation 0.04 0.00 0.08 0.00 21.50 98.50 382.00 9.69 CV Mean 3.54 11.31 82.02 1.11 0.79 0.94 0.93 0.94 5.50 16.50 106.00 1.77 9.19 9.19 24.04 1.27 E95% Standard deviation 0.25 0.05 0.04 0.07 32.50 109.50 402.00 3862.50 10.56 CV Mean 2.12 10.61 7.07 1.14 E99% Standard deviation 0.10 0.11 0.02 0.12 18.00 55.50 264.00 2475.00 5.43 CV Mean 0.71 2.12 11.31 0.00 0.13 0.13 0.11 0.00 1.00 2.00 15.00 300.00 0.19 215 6.36 0.71 45.25 419.31 0.11 L95% Standard deviation 0.20 0.01 0.11 0.11 0.01 46.50 198.50 542.50 4502.50 19.41 CV Mean 1.41 4.95 1.41 540.23 0.49 L99% Standard deviation 0.08 0.09 0.01 0.22 0.09 25.00 83.00 274.50 2926.50 8.06 CV Mean 1.41 2.83 21.21 424.26 0.26 1.41 1.41 1.41 1.41 1.41 6.50 13.50 56.50 1016.50 1.40 E90% Standard deviation CV 12.02 30.41 7.78 666.80 2.24 E95% Standard deviation 0.26 0.15 0.01 0.15 0.12 1.41 0.00 20.51 399.52 0.23 E99% Standard deviation 0.06 0.00 0.07 0.14 0.03 2.12 3.54 0.71 478.71 0.33 0.33 0.26 0.01 0.47 0.24 CV CV If the link-attribute and event-based hot zones are overlaid, the pedestrian hot zones which were only detected by the link-attribute (L-only) or the event-based (E-only) approach can be identified. Table 7.6 summarizes L-only and E-only hot zones with the threshold value determined by 99% percentile for the period from 2002 to 2004. There were only two small L-only hot zones (0.20 km) with average crash intensity (represented as the number of pedestrian casualties) equal to 15. The event-based approach detected 6 additional hot zones with length ranging from 0.18 km to 0.53 km and the crash intensity ranging from 13 to 30, indicating that the event-based approach detected more hot zones with varied length and crash intensity. For the cost-weighted hot zone type, the link-attribute approach detected no extra hot zones but the event-based approach identified some. These hot zones are further plotted onto maps for closer examination. Figure 7.2 delineates spatial locations of unweighted L-only and E-only hot zones and Figure 7.3 depicts the locations of cost-weighted E-only hot zones. Neither of two unweighted L-only hot zones identified in 2002-2004 could also be detected in 2005-2007, but three out of six unweighted and one of five cost-weighted E-only hot zones could be identified in both 2002-2004 and 2005-2007 periods. These four hot zones, which were unlikely to be “false negative”, are located around Kwun Tong Town Center. The area is a landmark activity center of Kwun Tong, which is a major drop-off point in the district (see minibus terminal as 217 an example in Figure 7.4a). It is not uncommon for pedestrians to share the same thoroughfares with on-road vehicular traffic in the area and the area lacks facilities such as guardrails to protect pedestrians (see Figure 7.4b). These locations require immediate countermeasures to improve the safety of pedestrians. Failing to identify these locations may cause more deaths and serious injuries. In this sense, the event-based approach might be chosen to identify hazardous road locations in order to avoid false negative locations. However, one may notice that three of six unweighted and four of five cost-weighted E-only hot zones could not be identified in the second period, which suggests that the event-based approach may cause more serious false positive problems. 218 Table 7.6 Summary on 99% L-only and E-only hot zones in 2002-2004 Unweighted Hot Zone Type Cost-weighted L_only E_only 6 L_only 0 E_only 5 2 Total 30 114 0 678 Minimum 14 13 0 114 Maximum 16 30 0 165 Mean 15 19 0 135.6 Standard deviation 1.00 7.4 0 18.23 CV 0.07 0.39 0 0.13 Total 0.40 1.77 0 1.16 Minimum 0.20 0.18 0 0.18 Maximum 0.20 0.53 0 0.37 Mean 0.20 0.29 0 0.23 Standard deviation 0.00 0.13 0 0.07 CV 0.00 0.45 0 0.30 Number of Hot Zones Crash intensity Length (km) 219 (a) (b) Figure 7.2 Locations of unweighted pedestrian hot zones of (a) L-only and (b) E-only 220 Figure 7.3 Locations of cost-weighted E-only pedestrian hot zones 221 (a) (b) Figure 7.4 Site Review using Google Street View (From the crossing of Fu Yan Street and Wut Wah Street) 222 7.2.2 Incorporation of Surrounding Environmental Variables The link-attribute analysis in Chapter 5 employed the negative binomial model to incorporate a range of surrounding environmental variables. It also applied the regression models to the estimation of the long-term mean of pedestrian casualty count (EB estimate). However, the event-based approach is not appropriate for performing such analysis because of its limitations in, as discussed in Section 6.2, modeling the relationship between environmental variables and the number of pedestrian casualties. 7.3 Summary This chapter analyzes pedestrian casualties in an event-based manner. When the simple ranking method is used to define the threshold value, the performance of the unweighted method is generally more stable than the cost-weighted method. In this sense, one may choose the unweighted method to identify hazardous road locations for pedestrians. However, although most hot zones could be identified by both unweighted and cost-weighted approaches, there existed some hot zones which could be identified only by one approach. While the unweighted approach targets locations with high concentration of pedestrian casualties regardless of injury type, the cost-weighted approach identified those with great number of fatalities and serious injuries. The choice of the casualty-weighed method in identifying 223 hazardous road locations for pedestrians depends on the targeted injury type of pedestrian casualties. The event-based approach is compared with the link-attribute approach in analyzing pedestrian casualties. While the link-attribute approach is less likely to cause false positive problems, the event-based approach is less likely to cause false negative problems. Compared with the link-attribute approach, it is difficult for the event-based approach to incorporate a set of road environmental variables. 224 CHAPTER 8 CONCLUSION 8.1 Summary of Findings This research explores the general procedures of the link-attribute and the event-based approaches in identifying crash hot zones as hazardous road locations, and investigates the characteristics of the two approaches by conducting a range of sensitivity analysis on the definition of threshold value and determination of crash intensity with both simulated and empirical data. This section will summarize the main findings of the research. 8.1.1 Link-Attribute Approach This research employed the link-attribute approach to identify hazardous road locations with both raw-link-node and dissolved road systems. Since using a dissolved road network can detect more hazardous road locations and the performance is more stable than using the raw-link-node road network, it is better to dissolve the road network first before taking any further steps. The numerical definition and simple ranking approaches can be used to identify hazardous road locations with the link-attribute approach. Using a numerical definition to identify hot zones is a simple and effort-saving approach, but one should be careful in choosing an appropriate value as the 225 threshold value. Monte Carlo simulation can be employed to avoid selecting bias in choosing an appropriate number as the threshold value, but it is rather time-consuming due to the large number of realizations. The link-attribute approach can employ statistical models to incorporate environmental variables. The hazardous road locations identified by different models may differ significantly; therefore, one should be very careful in choosing confounding variables. The EB technique can be employed to define the crash intensity instead of an observed-count (OC) approach. The difference between the EB-based and OC-based hot zones is significant and the former approach has a great advantage in regard to the stability of its performance, when the threshold value is determined as a relatively high value. It can be implied that the use of an EB estimate may exert more profound impacts on the rank of the hazardous road locations, which is; however, beyond the scope of this thesis but worthy of further investigation in future studies. The length of BSU impacts the results. As the variability is smaller among 100-meter hot zones, the segmentation length is better defined as 100 meters. 226 8.1.2 Event-Based Approach Numerical and simple-ranking approaches can be used to identify hazardous road locations with the event-based approach, while the former can identify a great number of hot zones with crash intensity less than the critical value of a hot zone. Monte Carlo simulation can be used to identify hazardous road locations. This method can also incorporate traffic volume by modifying the simulation approach and its performance is more stable when the traffic volume is taken into consideration. The interval of reference points impacts the results. As the variability is smaller among 100-meter hot zones, the interval of reference points is better defined as 100 meters. 8.1.3 Advantages and Drawbacks of the Two Approaches Based on the comparative analysis, the advantages and drawbacks of the two approaches are as follows. 227 8.1.3.1 Advantages Link-attribute approach The link-attribute approach is less likely to cause false positive problems, especially around road junctions. A typical example is the territory-wide identification of hazardous road locations when the threshold value is determined by an arbitrary number. The event-based approach detected a large number of crash hot zones around road junctions with crash intensity less than the critical value while the link-attribute approach did not. It is convenient for the link-attribute approach to incorporate a set of road environmental indicators. These variables can be easily incorporated by establishing statistical models such as negative binomial models, which are important in defining crash intensity (EB estimate) and threshold value (expected number of road crashes or pedestrian casualty counts). One may select the link-attribute approach without hesitation if environmental variables should be taken into consideration. Performing the link-attribute approach can also save time. No matter which method was used to define the threshold value, the link-attribute approach took less computing time than the event-based approach. 228 Event-based approach The event-based approach is less likely to cause false negative problems. As shown in the identification of hazardous road locations for pedestrians, the event-based approach detected a large number of hot zones around road junctions that could not be identified by the link-attribute approach. Although some of them were more likely to be false positive, some did require further investigation and countermeasures. Hence, one may consider the event-based approach if they focus more on false negative problems. 8.1.3.2 Drawbacks Link-attribute approach While the link-attribute approach is less likely to identify false positive locations, it is more likely to cause false negative problems. A typical example is the comparison of the empirical results on pedestrian casualties, in which the link-attribute approach might fail to detect hazardous road locations which could be identified by the event-based approach. Event-based approach The event-based approach is more likely to cause false positive problems, especially around road junctions. For instance, when an arbitrary number was used to determine the threshold value, a large share of hot zones, most of which were located around road junctions, were found with crash intensity 229 less than the critical value. It is difficult for the event-based approach to incorporate a set of road environmental variables by a statistical regression model. For one thing it is difficult for the event-based approach to obtain appropriate samples; another is assigning a suitable environmental attribute to a reference point, which is not an easy task. Performing the event-based approach requires much more effort. Regardless of the definition of the threshold value, it took the event-based approach more time to complete the whole procedure in order to identify crash hot zones. 8.2 Importance of the Study The identification of hazardous road locations plays a key role in reducing road crashes. This research explores the methodological issues on the link-attribute approach such as segmentation method and definition of threshold values in identifying crash hot zones as hazardous road locations. In particular, it employs a model-based approach to define the threshold value by taking into consideration a set of environmental variables, which has not been applied to the hot zone identification before. It also develops an event-based network statistic which can be used for the identification of crash hot zones. 230 By investigating the hot zone methodology, this study can enrich the theoretical knowledge of the identification of hazardous road locations and practically provide policy-makers with more information on identifying road hazards. In addition, the improvement of road safety requires joint efforts from different disciplines. This thesis can supplement the multi-discipline knowledge of road safety by combining both road safety and spatial analytical methods into the identification of hazardous road locations through a GIS platform. For instance, in order to reduce short BSUs, this research develops a GIS-based dissolving algorithm to dissolve the road network before segmentation performance; the location of the first reference point is determined by a newly developed GIS-based tool that can select the reference points in a random manner. 8.3 Limitations of the Study The limitations of the study are listed as follows: False positive and false negative This research examines the stability of the performance of the two approaches through the comparison of the results for the 2002-2004 and 2005-2007 time periods. The logic behind the comparison is that if a site can 231 be identified as hazardous in both periods, it is less likely to be a false positive hazard; however, whether the site is truly positive still requires further examination. For instance, to find out the reason why hot zones in earlier years (2002-2004) were not hot zones in later years (2005-2007), more investigations should be conducted to answer the questions like “Were the hazards removed because roads were no longer in service or because additional infrastructures were installed?” Simulations on a hypothetical road network In Chapter 3, when simulating road crashes on a hypothetical road network, the characteristics of spatial distribution of road crashes are roughly categorized into random, dispersed and concentrated patterns. For the concentrated pattern, this research only regards 200 meters as the length of a hot zone with 25 road crashes as the crash intensity. However, the concentration of road crashes may differ by clustering extent (such as crash intensity on each hot zone) and scale (the length of a hot zone), which may have impacts on the performance of the two approaches such as the choice of BSU length and RP interval. In this research, the road structure is categorized into three types, namely grid pattern with 24 road junctions, limited access pattern with 12 road junctions and organic pattern with 6 road junctions. Although the three 232 instances can give some basic ideas on the impacts of road structures, more realizations, such as different junction density with one particular type of road structure, may provide more insights on the performance of the two approaches. Weighting method This research implements event-based methodology for the identification of a hazardous road location by modifying the local K-function approach which treats road crashes equally within the search distance of a reference point. However, different weighting methods may have different impacts on the results. The hot zones identified by the unweighted method, such as the event-based approach introduced in this thesis, may be different with those identified by a distance-decay method, such as the kernel density estimation approach. The characteristics of the event-based methodology for the identification of hot zones may not be fully explored if only the unweighted approach is investigated. This research assigns different weights to casualties of different injury types according to the records of other countries. It fails to examine the sensitivity of hot zones to other weighting methods. Incorporation of “time” dimension This research focuses on the location of road crashes without considering 233 the “time” dimension which may have impacts on the performance of the two approaches. Ranking of hot zones This research focuses on the identification of hot zones and treats everyone equally regardless of whether it has the minimum or the maximum value of crash intensity. However, as resources are limited, it is impractical to give treatments to every hot zone. In this research, when the Monte Carlo simulation method was used to define the threshold value, the link-attribute approach detected 124 hot zones and the event-based approach identified 230 even though the 99.9% significance level was used. This raises an important question, which hot zone should be treated with top priority? This is very important to policy-makers, who may have to choose only 10 out of 230 hot zones for treatment. To answer this important question, one should not only indentify hot zones, but also rank the locations based on some criterion. 8.4 Further Research Directions Investigation of false positive and false negative problems More criteria in identifying false positive and false negative problems should be explored so as to further examine the robustness of the link-attribute and event-based hot zones in the identification of hazardous road locations. 234 This may also require further investigation of regression-to-the-mean problem. More importantly, investigation on hot zones in field work is necessary, which can help correctly identify “false positive” and “false negative” hot zones. More realizations of simulated crash patterns and road structures As the performance of the two approaches may be influenced by the crash pattern and road structure, more realizations of simulated crash patterns on more instances of hypothetical road networks will be established for further exploration on the two approaches. Sensitivity analysis on weighting methods Distance-decay methods, such as kernel density estimation, may be employed to identify hazardous road locations. This effort can enrich the knowledge of event-based methodology in the identification of hazardous road locations. The sensitivity of hot zones for casualties to weighting methods will be further investigated by assigning different weights in cost-weighted analysis. Incorporation of “time” dimension The ways in which the “time” dimension is introduced into the two approaches will be explored. A 3D-GIS environment will be used for developing the methods. 235 Ranking of hot zones The rank of hot zones may differ by criteria. To answer questions like, “which is the most dangerous road location, the one with the highest value of crash intensity or the one with the longest length?” A series of sensitivity analysis can be conducted to investigate the ranking method. Development of a software package A software package for both link-attribute and event-based approaches in identifying crash hot zones will be developed in order that other researchers and practitioners can conveniently apply these two approaches to the identification of hazardous road locations. 236 References Abbas, K. A. (2004). Traffic safety assessment and development of predictive models for accidents on rural roads in Egypt. Accident Analysis and Prevention, 36(2), 149-163. Abbess, C., Jarret, D., & Wright, C. C. (1981). Accidents at blackspots: Estimating the effectiveness of remedial treatment, with special reference to the “Regression-to-the-Mean” Effect. Traffic Engineering and Control 22 (10), 535-542. Abdalla, I. M., Raeside, R., Barker, D., & McGuigan, D. R. D. (1997). An investigation into the relationships between area social characteristics and road accident casualties. Accident Analysis and Prevention, 29 (5), 583-593. Abdel-Aty, M. A., & Radwan, A. E. (2000). Modeling traffic accident occurrence and involvement. Accident Analysis & Prevention, 32 (5), 633-642. Afukaar, F. K., & Damsere-Derry, J. (2010). Evaluation of speed humps on pedestrian injuries in Ghana. Injury Prevention, 16 (Suppl 1), A205-A206. Aguero-Valverde, J., & Jovanils, P. P. (2009). Bayesian Multivariate Poisson Lognormal models for crash severity modeling and site ranking. Transportation Research Record(2136), 82-91. Aguero-Valverde, J., & Jovanis, P. P. (2006). Spatial analysis of fatal and injury crashes in Pennsylvania. Accident Analysis and Prevention, 38 (3), 618-625. Anderson, T. K. (2009). Kernel density estimation and K-means clustering to profile road accident hotspots. Accident Analysis and Prevention, 41 (3), 359-364. Ashenfelter, O. (2006). Measuring the Value of a Statistical Life: Problems and Prospects. The Economic Journal, 116 (510), C10-C23. 237 Bailey, T. (2004). Statistical analysis of spatial point patterns. Second edition. International Journal of Geographical Information Science, 18 (1), 105-106. Black, W. R. (1991). Highway accidents: A spatial and temporal analysis. Transport Research Record, 1318 , 75-82. Black, W. R. (1992). Network autocorrelation in transport network and flow systems. Geographical Analysis, 24 (3), 207-222. Black, W. R., & Thomas, I. (1998). Accidents on Belgium's motorway: A network autocorrelation analysis. Journal of Transport Geography, 6 (1), 23-31. Blazquez, C. A., & Celis, M. S. (2012). A spatial and temporal analysis of child pedestrian crashes in Santiago, Chile. Accident Analysis & Prevention . Blower, D., Campbell, K. L., & Green, P. E. (1993). Accident rates for heavy truck-tractors in Michigan. Accident Analysis & Prevention, 25 (3), 307-321. Brijs, T., Karlis, D., & Wets, G. (2008). Studying the effect of weather conditions on daily crash counts using a discrete time-series model. Accident Analysis & Prevention, 40 (3), 1180-1190. Brüde, U., & Larsson, J. (1993). Models for predicting accidents at junctions where pedestrians and cyclists are involved. How well do they fit? Accident Analysis & Prevention, 25 (5), 499-509. Cameron, A. C., & Trivedi, P. K. (1998). Regression analysis of count data: Cambridge University Press. Census and Statistics Department (2002). Hong Kong 2001 Population Census. HKSAR: Census and Statistics Department. Census and Statistics Department (2007). Hong Kong 2006 Population By-census. HKSAR. Centre, M. P. (2005). Digital Topographic Map 2004. HKSAR: Survey and Mapping Office. 238 Chen, C., Lin, H. Y., & Loo, B. P. Y. (2012). Exploring the impacts of safety culture on immigrants' vulnerability in non-motorized crashes: A cross-sectional study. Journal of Urban Health-Bulletin of the New York Academy of Medicine, 89 (1), 138-152. Cheng, W., & Washington, S. (2008). New criteria for evaluating methods of identifying hot spots. Transportation Research Record(2083), 76-85. Cheng, W., & Washington, S. P. (2005). Experimental evaluation of hotspot identification methods. Accident Analysis and Prevention, 37 (5), 870-881. Cottrill, C. D., & Thakuriah, P. V. (2010). Evaluating pedestrian crashes in areas with high low-income or minority populations. Accident Analysis & Prevention, 42 (6), 1718-1728. De Blaeij, A., Florax, R. J. G. M., Rietveld, P., & Verhoef, E. (2003). The value of statistical life in road safety: a meta-analysis. Accident Analysis & Prevention, 35 (6), 973-986. Delmelle, E. C., & Thill, J.-C. (2008). Urban bicyclists: Spatial analysis of adult and youth traffic hazard intensity. Transportation Research Record: Journal of the Transportation Research Board, 2074 , 31-39. Department for Transport (2007). Highways Economics Note No. 1 Department for Transport, UK. Dissanayake, D., Aryaija, J., & Wedagama, D. (2009). Modelling the effects of land use and temporal factors on child pedestrian casualties. Accident Analysis & Prevention, 41 (5), 1016-1024. Eckley, D. C., & Curtin, K. M. (2012). Evaluating the spatiotemporal clustering of traffic incidents. Computers, Environment and Urban Systems. Elvik, R. (1997). Evaluations of road accident blackspot treatment: A case of the iron law of evaluation studies? Accident Analysis and Prevention, 29(2), 191-199. Elvik, R. (2006). New approach to accident analysis for hazardous road locations. Safety Data, Analysis, and Evaluation(1953), 50-55. 239 Elvik, R. (2007). State-of-the-art approach to road accident black spot management and safety analysis of road networks . Oslo: Institute of Transport Economics. Elvik, R. (2008). A survey of operational definitions of hazardous road locations in some European countries. Accident Analysis and Prevention, 40(6), 1830-1835. Elvik, R. (2012). Speed limits, enforcement, and health consequences. Annual Review of Public Health, 33 , 225-238. Elvik, R., Høye, A., Vaa, T., & Sørensen, M. (2009). The handbook of road safety measures. Erdogan, S. (2009). Explorative spatial analysis of traffic accident statistics and road mortality among the provinces of Turkey. Journal of Safety Research, 40(5), 341-351. Erdogan, S., Yilmaz, I., Baybura, T., & Gullu, M. (2008). Geographical information systems aided traffic accident analysis system case study: city of Afyonkarahisar. Accident Analysis and Prevention, 40 (1), 174-181. Flahaut, B. (2004). Impact of infrastructure and local environment on road unsafety - Logistic modeling with spatial autocorrelation. Accident Analysis and Prevention, 36 (6), 1055-1066. Flahaut, B., Mouchart, M., San Martin, E., & Thomas, I. (2003). The local spatial autocorrelation and the Kernel method for identifying black zones - A comparative approach. Accident Analysis and Prevention, 35 (6), 991-1004. Fotheringham, A., Brubsdon, C., & Charlton, M. (2000). Quantitative Geography: Perspectives on Spatial Data Analysis . London: Sage Publication. Gårder, P. E. (2004). The impact of speed and other variables on pedestrian safety in Maine. Accident Analysis & Prevention, 36 (4), 533-542. Geurts, K. (2006). Ranking and Profiling Dangerous Accident Locations Using Data Mining and Statistical Techniques. Unpublished Doctoral Dissertation, Hasselt University, Hasselt. 240 Unfallkostenrechnung Straß e 2007 Berücksichtigung des menschlichen Leids (Willingness to Pay) . GmbH, H. C. (2008). unter Graham, D. J., & Glaister, S. (2003). Spatial variation in road pedestrian casualties: The role of urban scale, density and land-use mix. Urban Studies, 40(8), 1591-1607. Graham, D. J., Glaister, S., & Anderson, R. (2005). The effects of area deprivation on the incidence of child and adult pedestrian casualties in England. Accident Analysis and Prevention, 37 , 125-135. Graham, D. J., & Stephens, D. A. (2008). Decomposing the impact of deprivation on child pedestrian casualties in England. Accident Analysis & Prevention, 40 (4), 1351-1364. Green, J., Muir, H., & Maher, M. (2011). Child pedestrian casualties and deprivation. Accident Analysis & Prevention, 43 (3), 714-723. Guo, X., & Sheng, Y. (2009). Gong Lu Jiao Tong Shi Gu Hei Dian Wen Xi Ji Shu. Nanjing: Southeast University Presee. Hadayeghi, A., Shalaby, A. S., & Persaud, B. N. (2007). Safety prediction models: Proactive tool for safety evaluation in urban transportation planning applications. Transportation Research Record: Journal of the Transportation Research Board, 2019 (-1), 225-236. Harwood, D. W., Bauer, K. M., Richard, K. R., Gilmore, D. K., Graham, J. L., Potts, I. B., et al. (2008). Pedestrian Safety Prediction Methodology . Hauer, E. (1997). Observational before-after studies in road safety (Vol. Oxford): Pergamon. Hauer, E., Harwood, D. W., Council, F. M., & Griffith, M. S. (2002). Estimating safety by the empirical Bayes method - A tutorial. Statistical Methodology: Applications to Design, Data Analysis, and Evaluation(1784), 126-131. Hauer, E., Ng, J. C. N., & Lovell, J. (1988). Estimation of safety at signalized intersections. Transport Research Record 1185 , 48-61. 241 Higle, J. L., & Hecht, M. B. (1989). A Comparison of Techniques for the Identification of Hazardous Locations. Transportation Research Record 1238, 10-19. Hilbe, J. M. (2011). Negative binomial regression: Cambridge University Press. Hu, G., Wen, M., Baker, T., & Baker, S. (2008). Road-traffic deaths in China, 1985–2005: threat and opportunity. Injury Prevention, 14 (3), 149-153. Huang, H. L., Abdel-Aty, M. A., & Darwiche, A. L. (2010). County-Level Crash Risk Analysis in Florida. Transportation Research Record: Journal of the Transportation Research Board, 2148 (-1), 27-37. Huang, H. L., Chin, H. C., & Haque, M. M. (2009). Empirical evaluation of alternative approaches in identifying crash hot spots Naive Ranking, Empirical Bayes, and Full Bayes methods. Transportation Research Record(2103), 32-41. Hummel, T. (2001). Land use planning in safer transportation network planning: safety principles, planning framework, and library information : Leidschendam: SWOV Institute for Road Safety Research. iRAP. (2007). The True Cost of Road Crashes: Valuing Life and the Cost of a Serious Injury: International Road Assessment Programme. Jegede, F. (1988). Spatio-temporal analysis of road traffic accidents in Oyo State, Nigeria. Accident Analysis & Prevention, 20 (3), 227-243. Jones, A. P., Langford, I. H., & Bentham, G. (1996). The application of K-function analysis to the geographical distribution of road traffic accident outcomes in Norfolk, England. Social Science & Medicine, 42 (6), 879-885. Joshua, S. C., & Garber, N. J. (1990). Estimating truck accident rate and involvements using linear and Poisson regression models. Transportation Planning and Technology, 15 (1), 41-58. Keay, K., & Simmonds, I. (2005). The association of rainfall and other weather variables with road traffic volume in Melbourne, Australia. Accident Analysis & Prevention, 37 (1), 109-124. 242 Kim, K., Brunner, I. M., & Yamashita, E. Y. (2006). Influence of land use, population, employment, and economic activity on accidents. Transportation Research Record: Journal of the Transportation Research Board, 1953(-1), 56-64. Kmet, L., & Macarthur, C. (2006). Urban-rural differences in motor vehicle crash fatality and hospitalization rates among children and youth. Accident Analysis and Prevention, 38 (1), 122-127. Lan, B., & Persaud, B. (2011). Fully Bayesian Approach to Investigate and Evaluate Ranking Criteria for Black Spot Identification. Transportation Research Record: Journal of the Transportation Research Board, 2237 (-1), 117-125. Lands Department, Hong Kong. (2004). Implementation of Data Alignment Measures for the Alignment of Planning, Lands and Public Works Data. Ladron de Guevara, F., Washington, S. P., & Oh, J. (2004). Forecasting crashes at the planning level: simultaneous negative binomial crash model applied in Tucson, Arizona. Transportation Research Record: Journal of the Transportation Research Board, 1897 (-1), 191-199. LaScala, E. A., Gerber, D., & Gruenewald, P. J. (2000). Demographic and environmental correlates of pedestrian injury collisions: a spatial analysis. Accident; analysis and prevention, 32 (5), 651. LaScala, E. A., Gruenewald, P. J., & Johnson, F. W. (2004). An ecological study of the locations of schools and child pedestrian injury collisions. Accident Analysis and Prevention, 36 (4), 569-576. Laughlin, J. C., Hauer, L. E., Hall, J. W., & Clough, D. R. (1975). NCHRP Report 162: methods for evaluating highway safety improvements. Washington, D.C.: National Research Counci. Law, T. H., Noland, R. B., & Evans, A. W. (2009). Factors associated with the relationship between motorcycle deaths and economic growth. Accident Analysis & Prevention, 41 (2), 234-240. Le, H., Geldermalsen, T. v., Lim, W. L., & Murphy, P. (2011). Deriving accident costs using Willingness-to-Pay Approaches - A case study for 243 Singapore Paper presented at the Australasian Transport Research Forum 2011 Adelaide, Australia Leden, L. (2002). Pedestrian risk decrease with pedestrian flow. A case study based on data from signalized intersections in Hamilton, Ontario. Accident Analysis & Prevention, 34 (4), 457-464. Lee, C., & Abdel-Aty, M. (2005). Comprehensive analysis of vehicle–pedestrian crashes at intersections in Florida. Accident Analysis & Prevention, 37 (4), 775-786. Lee, S. C. (1989). Road traffic monitoring in Hong Kong. Proceedings of the Second International Conference on Road Traffic Monitoring . Levine, N., Kim, K. E., & Nitz, L. H. (1995). Spatial analysis of Honolulu motor vehicle crashes: II. Zonal generators Accident Analysis and Prevention, 27(5), 675-685 Li, L., Zhu, L., & Sui, D. Z. (2007). A GIS-based Bayesian approach for analyzing spatial–temporal patterns of intra-city motor vehicle crashes. Journal of Transport Geography, 15 (4), 274-285. Ljung Aust, M., Fagerlind, H., & Sagberg, F. (2011). Fatal intersection crashes in Norway: Patterns in contributing factors and data collection challenges. Accident Analysis & Prevention. Loo, B. P. Y. (2006). Validating crash locations for quantitative spatial analysis: A GIS-based approach. Accident Analysis and Prevention, 38 (5), 879-886. Loo, B. P. Y. (2009). The identification of hazardous road locations: A comparison of the blacksite and hot zone methodologies in Hong Kong. International Journal of Sustainable Transportation, 3 (3), 187-202. Loo, B. P. Y., Cheung, W. S., & Yao, S. (2011). The Rural-Urban Divide in Road Safety: The Case of China. The Open Transportation Journal, 5 , 9-20. Loo, B. P. Y., & Tsui, M. K. (2005). Temporal and spatial patterns of vehicle-pedestrian crashes in busy commercial and shopping areas: A case study of Hong Kong. Asian Geographer, 24 (1-2), 113-128. 244 Loo, B. P. Y., Wong, S., Hung, W., & Lo, H. K. (2007). A review of the road safety strategy in Hong Kong. Journal of Advanced Transportation, 41 (1), 3-37. Loo, B. P. Y., & Yao, S. (2010). The impact of area deprivation on traffic casualties in Hong Kong. Paper presented at the The 5th HKSTS International Conference, Hong Kong. Loo, B. P. Y., & Yao, S. (2012). Geographic information systems. In G. Li & S. Baker (Eds.), Injury Research: Theories, Methods, and Approaches (pp. 447-463). New York: Springer. Loo, B. P. Y., Yao, S., & Wu, J. (2011). Spatial point analysis of road crashes in Shanghai: a GIS-based network kernel density method. Proceedings of the International Conference on GeoInformatics, 2011 . Loo, B. P. Y., Yao, S., Wu, J., Yu, B., & Zhong, H. (2011). Identification method of road hot zone based on GIS. Journal of Traffic and Transportation Engineering, 11 (4), 97-103. Lovegrove, G. R., & Sayed, T. (2006). Macro-level collision prediction models for evaluating neighbourhood traffic safety. Canadian Journal of Civil Engineering, 33 (5), 609-621 Lyon, C., & Persaud, B. (2002). Pedestrian collision prediction models for urban intersections. Transportation Research Record: Journal of the Transportation Research Board, 1818 (-1), 102-107. Marshall, W. E., & Garrick, N. W. (2011). Does street network design affect traffic safety? Accident Analysis & Prevention, 43 (3), 769-781. Miranda-Moreno, L. E., & Fu, L. (2007). Traffic safety study: Empirical Bayes or Full Bayes? Paper presented at the 86th Annual Meeting of the Transportation Research Board. McGuigan, D. (1981). The use of relationships between road accidents and traffic flow in" black-spot" identification. Traffic Engineering and Control, 22(HS-032 669). 245 McMahon, P. J., Duncan, C., Stewart, J. R., Zegeer, C. V., & Khattak, A. J. (1999). Analysis of factors contributing to “walking along roadway” crashes. Journal of the Transportation Research Board, 1999 , 41-48. Miaou, S. P. (1994). The relationship between truck accidents and geometric design of road sections: Poisson versus negative binomial regressions. Accident Analysis & Prevention, 26 (4), 471-482. Miaou, S. P., & Lord, D. (2003). Modeling traffic crash flow relationships for intersections - Dispersion parameter, functional form, and Bayes versus empirical Bayes methods. Statistical Methods and Modeling and Safety Data, Analysis, and Evaluation(1840), 31-40. Miller, T. R. (2000). Variations between countries in values of statistical life. Journal of Transport Economics and Policy, 34 , 169-188. Ministry of Transport. (2010). The Social Cost of Road Crashes and Injuries June 2010 update: Ministry of Transport. Montella, A. (2010). A comparative analysis of hotspot identification methods. Accident Analysis and Prevention, 42 (2), 571-581. Moons, E., Brijs, T., & Wets, G. (2009a). Identifying hazardous road locations: Hot spots versus hot zones. In M. L. Gavrilova & C. J. K. Tan (Eds.), Transactions on Computational Science Vi (pp. 288-300). Berlin, Heidelberg: Springer-Verlag Berlin Heidelberg. Moons, E., Brijs, T., & Wets, G. (2009b). Improving Moran's Index to identify hot spots in traffic safety In B. Murgante, G. Borruso & A. Lapucci (Eds.), Geocomputation and Urban Planning (pp. 117–132). Heidelberg: Springer. Moran, P. (1948). The interpretation of statistical maps. Journal of the Royal Statistical Society, 10b, 243-251. Mueller, B. A., Rivara, F. P., & Bergman, A. B. (1988). Urban-rural location and the risk of dying in a pedestrian-vehicle collision. The Journal of Trauma, 28(1), 91-94. 246 O'Sullivan, D., & Unwin, D. J. (2003). Geographic Information Analysis. Hoboken,New Jersey: John Wiley & Sons, Inc. Okabe, A., Okunuki, K., & Shiode, S. (2006a). The SANET Toolbox: New method for network spatial analysis. Transactions in GIS, 10 (4), 535-550. Okabe, A., Okunuki, K., & Shiode, S. (2006b). SANET: A toolbox for spatial analysis on a network. Geographical Analysis, 38(1), 57-66. Okabe, A., Satoh, T., & Sugihara, K. (2009). A kernel density estimation method for networks, its computational method and a GIS-based tool. International Journal of Geographical Information Science, 23 (1), 7-32. Okabe, A., & Yamada, I. (2001). The K-function method on a network and its computational implementation. Geographical Analysis, 33 (3), 271-290. Okabe, A., Yomono, H., & Kitamura, M. (1995). Statistical analysis of the distribution of points on a Network. Geographical Analysis, 27(2), 152-175. Openshaw, S., Charlton, M., Wymer, C., & Craft, A. (1987). Developing a mark 1 geographical analysis machine for the automated analysis of point data sets. International Journal of Geographical Information Systems, 1 , 335-358. Payne, G., Payne, J., & Hyde, M. (1996). 'Refuse of All Classes'? Social Indicators and Social Deprivation. Sociological Research Online, 1 . Pei, X., Wong, S., & Sze, N. (2011). A joint-probability approach to crash prediction models. Accident Analysis & Prevention, 43 (3), 1160-1166. Pei, X., Wong, S., & Sze, N. (2012). The roles of exposure and speed in road safety analysis. Accident Analysis & Prevention. Peden, M., Scurfield, R., Sleet, D., Mohan, D., Hyder, A. A., Jarawan, E., et al. (2004). World report on road traffic injury prevention . Geneva: World Health Organization (WHO). Peeters, D., & Thomas, I. (2009). Network Autocorrelation. Geographical Analysis, 41(4), 436-443. 247 Persaud, B. (1991). Estimating accident potential of Ontario road sections. transport Research Record 1327 , 47-54. Persaud, B., Lan, B., Lyon, C., & Bhim, R. (2010). Comparison of empirical Bayes and full Bayes approaches for before–after road safety evaluations. Accident Analysis & Prevention, 42 (1), 38-43. Persaud, B., Lyon, C., & Nguyen, T. (1999). Empirical Bayes procedure for ranking sites for safety investigation by potential for safety improvement. Transportation Research Record, 1665 . Planning Department (2001). 2001 Tertiary Planning Unit and Street Block/Village Cluster(TPU&SB/VC) Boundaries. HKSAR. Planning Department (2006). 2006 Tertiary Planning Unit and Street Block/Village Cluster (TPU&SB/VC) Boundaries. HKSAR. Planning Department (2007). Land Utilization in Hong Kong 2006. HKSAR. Pulugurtha, S. S., & Sambhara, V. R. (2011). Pedestrian crash estimation models for signalized intersections. Accident Analysis & Prevention, 43 (1), 439-446. Plug, C., Xia, J. C., & Caulfield, C. (2011). Spatial and temporal visualisation techniques for crash analysis. Accident Analysis & Prevention, 43 (6), 1937-1946. Pulugurtha, S. S., Krishnakumar, V. K., & Nambisan, S. S. (2007). New methods to identify and rank high pedestrian crash zones: An illustration. Accident Analysis and Prevention, 39 (4). Qin, X., Ivan, J. N., & Ravishanker, N. (2004). Selecting exposure measures in crash rate prediction for two-lane highway segments. Accident Analysis and Prevention, 36 (2), 183-191. Qin, X., Ivan, J. N., Ravishanker, N., Liu, J., & Tepas, D. (2006). Bayesian estimation of hourly exposure functions by crash type and time of day. Accident Analysis & Prevention, 38 (6), 1071-1080. Quddus, M. A. (2008). Modelling area-wide count outcomes with spatial correlation and heterogeneity: An analysis of London crash data. Accident 248 Analysis & Prevention, 40 (4), 1486-1497. Rasouli, M. R., Nouri, M., Zarei, M. R., Saadat, S., & Rahimi-Movaghar, V. (2008). Comparison of road traffic fatalities and injuries in Iran with other countries. Chinese Journal of Traumatology (English Edition), 11 (3), 131-134. Ripley, B. D. (1987).Stochastic Simulation .Wiley & Sons. Saccomanno, F. F., & Buyco, C. (1988). Generalized loglinear models of truck accident rates. Sawilowsky, Shlomo S.; Fahoome, Gail C. (2003). Statistics via Monte Carlo Simulation with Fortran. Rochester Hills, MI: JMASM. Schneider, R. J., Ryznar, R. M., & Khattak, A. J. (2004). An accident waiting to happen: a spatial approach to proactive pedestrian planning. Accident Analysis and Prevention, 36 (2), 193-211. Shiode, S. (2011). Street‐level Spatial Scan Statistic and STAC for Analysing Street Crime Concentrations. Transactions in GIS, 15 (3), 365-383. Song, J. J., Ghosh, A., Miaou, S., & Mallick, B. (2006). Bayesian multivariate spatial models for roadway traffic crash mapping. Journal of Multivariate Analysis, 97(1), 246-273. Spoerri, A., Egger, M., & von Elm, E. (2011). Mortality from road traffic accidents in Switzerland: Longitudinal and spatial analyses. Accident Analysis & Prevention, 43 (1), 40-48. Steenberghen, T., Aerts, K., & Thomas, I. (2010). Spatial clustering of events on a network. Journal of Transport Geography, 18 (3), 411-418. Steenberghen, T., Dufays, T., Thomas, I., & Flahaut, B. (2004). Intra-urban location and clustering of road accidents using GIS: a Belgian example. International Journal of Geographical Information Science, 18 (2), 169-181. Szabat, J., & Knapp, L. (2009). Treatment of the Economic Value of a Statistical Life in Departmental Analyses – 2009 Annual Revision. Washington, D.C.: 249 Office of the Secretary of Transportation. Transport Department (2001-2010). Road Traffic Accident Statistics. HKSAR: Transport Department. Transport Department (2002-2007). Hong Kong Annual Traffic Census. HKSAR: Transport Department. Tsui, M. K. (2006). Pedestrian Crashes in Commercial and Business Areas: A Case Study of Hong Kong. Unpublished MPhil Thesis. Tsui, K., So, F., Sze, N., Wong, S., & Leung, T. (2009). Misclassification of injury severity among road casualties in police reports. Accident Analysis & Prevention, 41 (1), 84-89. Tunaru, R. (1999). Hierarchical Bayesian models for road accident data. Traffic engineering & control, 40 (6), 318-324. Van den Bossche, F., Wets, G., & Brijs, T. (2005). Role of exposure in analysis of road accidents: a Belgian case study. Transportation Research Record: Journal of the Transportation Research Board, 1908 (-1), 96-103. Waller, L., & Gotway, C. (2004). Applied spatial statistics for public health data. New York: Wiley. Wang, C., Quddus, M. A., & Ison, S. G. (2011). Predicting accident frequency at their severity levels and its application in site ranking using a two-stage mixed multivariate model. Accident Analysis & Prevention, 43 (6), 1979-1990. Wier, M., Weintraub, J., Humphreys, E. H., Seto, E., & Bhatia, R. (2009). An area-level model of vehicle-pedestrian injury collisions with implications for land use and transportation planning. Accident Analysis & Prevention, 41(1), 137-145. Wong, S. C., Sze, N. N., & Li, Y. C. (2007). Contributory factors to traffic crashes at signalized intersections in Hong Kong. Accident Analysis and Prevention, 39 (6), 1107-1113. World Health Organization (2004). World Health Day: Road safety is no 250 accident! , from http://www.who.int/mediacentre/news/releases/2004/pr24/en/index.html World Health Organization (2008a). The Global Burden of Disease: 2004 update. World Health Organization (2008b). World health statistics 2008 . World Health Organization (2009). Global Status Report on Road Safety . Xie, Z. X., & Yan, J. (2008). Kernel Density Estimation of traffic accidents in a network space. Computers Environment and Urban Systems, 32 (5), 396-406. Yang, J., & Otte, D. (2007). A comparison study on vehicle traffic accident and injuries of vulnerable road users in China and Germany. Paper presented at the Proceedings 20th International Technical conference on the Enhanced Safety of Vehicles. Lyon, France. Paper. Yamada, I., & Thill, J.-C. (2004). Comparison of planar and network K-functions in traffic accident analysis. Journal of Transport Geography, 12, 149-158. Yamada, I., & Thill, J. C. (2007). Local indicators of network-constrained clusters in spatial point patterns. Geographical Analysis, 39 (3), 268-292. Yamada, I., & Thill, J. C. (2010). Local indicators of network-constrained clusters in spatial patterns represented by a link attribute. Annals of the Association of American Geographers, 100 (2), 269-285. 251