the identification of hazardous road locations Advisor(s)

advertisement
Title
Advances in spatial analysis of traffic crashes: the identification
of hazardous road locations
Advisor(s)
Loo, BPY
Author(s)
Yao, Shenjun.; 姚申君.
Citation
Issued Date
URL
Rights
2013
http://hdl.handle.net/10722/184257
The author retains all proprietary rights, (such as patent rights)
and the right to use in future works.
ADVANCES IN SPATIAL ANALYSIS OF TRAFFIC
CRASHES: THE IDENTIFICATION OF
HAZARDOUS ROAD LOCATIONS
YAO SHENJUN
PhD Thesis
THE UNIVERSITY OF HONG KONG
2013
Abstract of thesis entitled
Advances in Spatial Analysis of Traffic Crashes: The
Identification of Hazardous Road Locations
Submitted by
Yao Shenjun
For the degree of Doctor of Philosophy
at The University of Hong Kong
in April 2013
The identification of hazardous road locations is important to the
improvement of road safety. However, there is still no consensus on the best
method of identifying hazardous road locations. While traditional methods,
such as the hot spot methodology, focus on the physical distances separating
road crashes only, the hot zone methodology takes network contiguity into
consideration and treats contiguous road segments as hazardous road locations.
Compared with the hot spot method, hot zone methodology is a relatively new
direction and there still remain a number of methodological issues in applying
the method to the identification of hazardous road locations. Hence, this study
aims to provide a GIS-based study on the identification of crash hot zones as
hazardous road locations with both link-attribute and event-based approaches.
It first explores the general procedures of the two approaches in identifying
traffic crash hot zones, and then investigates the characteristics of the two
approaches by conducting a range of sensitivity analysis on defining threshold
value and crash intensity with both simulated and empirical data.
The results suggest that it is better to use a dissolved road network instead
of a raw-link-node road network. The segmentation length and the interval of
reference points have great impacts on the identification of hot zones, and
they are better defined as 100 meters considering the stabilities of the
performance. While employing a numerical definition to identify hot zones is
a simple and effort-saving approach, using the Monte Carlo method can avoid
selection bias in choosing an appropriate number as the threshold value. If the
two approaches are compared, it is observed that the link-attribute approach is
more likely to cause false negative problem and the event-based approach is
prone to false positive problem around road junctions. No matter which
method is used, the link-attribute approach requires less computer time in
identifying crash hot zones. When a range of environmental variables have to
be taken into consideration, the link-attribute approach is superior to the
event-based approach in that it is easier for the link-attribute approach to
incorporate environmental variables with statistical models.
By investigating the hot zone methodology, this research is expected to
enrich the theoretical knowledge of the identification of hazardous road
locations and to practically provide policy-makers with more information on
identifying road hazards. Further research efforts have to be dedicated to the
ranking of hot zones and the investigation of false positive and false negative
problems.
(399 words)
Advances in Spatial Analysis of Traffic Crashes: The
Identification of Hazardous Road Locations
by
YAO Shenjun
(姚申君)
A thesis submitted in partial fulfillment of the requirements for
the Degree of Doctor of Philosophy
at the University of Hong Kong
April, 2013
Declaration
I declare that this thesis represents my own work, except where due
acknowledgement is made, and that it has not been previously included in a thesis,
dissertation or report submitted to this University or to any other institution for a
degree, diploma or other qualifications.
Signed ……………………………………………………………………………………………………………………………………
YAO Shenjun
i
Acknowledgements
I owe my sincere gratitude to my supervisor, Professor Becky P.Y. Loo for
her inspiring suggestions on research arenas and continuous support on my
personal life. She is a role model for me, not only as an outstanding teacher or
an innovative scholar, but also as a caring mother who has three lovely
children.
I would like to say thank you to my friends, Mr. Aijun Yang, Ms. Cheng
Wang, Ms. Daisy Ho, Ms. Fangxin Yi, Ms. Huijuan Deng, Ms. Lihua Peng, Mr.
James H. Lenzer, Ms. Qian Liu, Mr. Qing Pei, Ms. Shuwen Liu, Mr. Tao Liu,
Ms. Xu Xu, Ms. Yujing Xie and Mr. Yu Wang. Without their smile and support,
the past four years would have been more difficult. Special thanks go to my
dear fellows, Ms. Alice Chow, Ms. Linna Li, Ms. Winnie Lam and Mr. Yuhao
Wu. Great gratitude goes to the staff of the Geography Department for their
support and patience. I would also like to say thank you to Dr. P. C. Lai,
Professor S. C. Wong and Professor Guohua Li for their review of this thesis.
I especially thank my parents and my husband for their full support and
love throughout the years.
Shenjun Yao
April, 2013
iii
Publications
Loo, B. P. Y. & Yao, S. (2010). The impact of area deprivation on traffic
casualties in Hong Kong. Paper presented in the 15th HKSTS International
Conference.
Loo, B. P. Y., Yao, S., Wu, J., Yu, B., & Zhong, H. (2011). Identification method
of road hot zone based on GIS. Journal of Traffic and Transportation
Engineering (in Chinese), 11(4), 97-103.
Loo, B. P. Y., Yao, S., & Wu, J. (2011). Spatial point analysis of road crashes in
Shanghai: a GIS-based network kernel density method. Paper presented in
the International Conference on GeoInformatics, 2011.
Loo, B. P. Y., & Yao, S. (2012). Geographic information systems. In G. Li & S.
Baker (Eds.), Injury Research: Theories, Methods, and Approaches (pp.
447-463). New York: Springer.
Yao, S., & Loo, B. P. Y. (2012). Identification of hazardous road locations for
pedestrians. Paper presented at the 2012 International Symposium on Safety
Science and Technology.
v
Table of Contents
DECLARATION.................................................................................................................. I
ACKNOWLEDGEMENTS.................................................................................................. III
PUBLICATIONS ............................................................................................................... V
TABLE OF CONTENTS ................................................................................................... VII
LIST OF FIGURES ............................................................................................................XI
LIST OF TABLES ............................................................................................................XIII
LIST OF ABBREVIATIONS ............................................................................................ XVII
CHAPTER 1 INTRODUCTION............................................................................................ 1
1.1
1.1.1
1.1.2
1.1.3
1.2
1.3
1.4
1.5
1.6
Research Background ............................................................................. 4
Identification of Hazardous Road Locations ........................................ 4
Detection of Spatial Concentration of Road Crashes ........................... 8
Road Safety in Hong Kong ................................................................ 12
Aim and Objectives .............................................................................. 16
Research Questions .............................................................................. 17
Significance of Study ............................................................................ 19
Definition of Terms ............................................................................... 21
Organization of Thesis.......................................................................... 22
CHAPTER 2 LITERATURE REVIEW .................................................................................. 25
2.1
2.1.1
2.1.2
2.2
2.3
2.3.1
2.3.2
2.3.3
2.3.4
2.4
Link-Attribute Methods in Identifying Hazardous Road Locations.......... 25
Hot Spot Methodology..................................................................... 26
Hot Zone Methodology .................................................................... 29
Event-Based Approaches for the Identification of Road Crash Clusters .. 32
Environmental Factors Contributing to Road Crashes............................ 35
Road Environment ........................................................................... 35
Spatial Environment ......................................................................... 36
Demographic and Socio-economic Environment .............................. 37
Natural Environment........................................................................ 38
Summary ............................................................................................. 38
CHAPTER 3 METHODOLOGY ......................................................................................... 40
3.1
3.2
3.2.1
3.2.2
3.3
3.3.1
3.3.2
3.3.3
3.4
3.4.1
3.4.2
3.5
Methodological Framework ................................................................. 40
Network-constrained Methods for Hot Zone Identification ................... 45
Link-attribute Method...................................................................... 45
Event-based Method........................................................................ 59
Data Analysis with Simulated Data....................................................... 70
Data Description .............................................................................. 71
Data Analysis ................................................................................... 74
Results ............................................................................................. 76
Data Analysis with Empirical Data ........................................................ 82
Data Collection and Editing .............................................................. 82
Data Analysis ................................................................................... 93
Summary ............................................................................................. 95
vii
CHAPTER 4 LINK-ATTRIBUTE ANALYSIS FOR ROAD CRASHES ........................................ 97
4.1
4.1.1
4.1.2
4.2
4.2.1
4.2.2
4.2.3
4.3
Territory-Wide Identification of Hazardous Road Locations ................... 97
Data Description .............................................................................. 97
Data Analysis ................................................................................... 98
Results ............................................................................................... 108
Numerical Definition ...................................................................... 108
Monte Carlo Simulation ................................................................. 116
Incorporation of Road Environmental Variables.............................. 118
Summary ........................................................................................... 125
CHAPTER 5 LINK-ATTRIBUTE ANALYSIS FOR ROAD CASUALTIES .................................. 127
5.1
5.1.1
5.1.2
5.2
5.2.1
5.2.2
5.3
Importance of Analysis based on Casualties ....................................... 127
Importance of Casualty-Weighted Analysis .................................... 127
Targeted Casualties: Pedestrians .................................................... 129
Casualty-Weighted Analysis ............................................................... 134
Unweighted ................................................................................... 134
Cost-Weighted ............................................................................... 135
District-Wide Identification of Hazardous Road Locations for Pedestrians
136
5.3.1
Data Description ............................................................................ 137
5.3.2
Data Analysis ................................................................................. 143
5.3.3
Empirical Bayes .............................................................................. 152
5.4
Results ............................................................................................... 155
5.4.1
Simple Ranking .............................................................................. 155
5.4.2
Incorporation of the Surrounding Environment of Crashes Involving
Pedestrians..................................................................................................... 160
5.4.3
Empirical Bayes .............................................................................. 166
5.5
Summary ........................................................................................... 170
CHAPTER 6 EVENT-BASED ANALYSIS FOR ROAD CRASHES .......................................... 173
6.1
6.1.1
6.1.2
6.1.3
6.2
6.2.1
6.2.2
6.2.3
6.3
Territory-Wide Identification of Hazardous Road Locations ................. 173
Data Description ............................................................................ 173
Data Analysis ................................................................................. 174
Results ........................................................................................... 178
Comparison with Link-attribute Approach .......................................... 190
Numerical Definition-An Arbitrary Number .................................... 191
Monte Carlo Simulation on Crash Frequency .................................. 195
Incorporation of Road Environmental Variables.............................. 197
Summary ........................................................................................... 201
CHAPTER 7 EVENT-BASED ANALYSIS FOR ROAD CASUALTIES...................................... 203
7.1
7.1.1
7.1.2
7.1.3
7.2
7.2.1
7.2.2
District-Wide Identification of Hazardous Road Locations for Pedestrians
203
Data Description ............................................................................ 203
Data Analysis ................................................................................. 204
Results ........................................................................................... 205
Comparison with Link-attribute Approach .......................................... 213
Simple Ranking .............................................................................. 213
Incorporation of Surrounding Environmental Variables .................. 223
viii
7.3
Summary ........................................................................................... 223
CHAPTER 8 CONCLUSION............................................................................................ 225
8.1
8.1.1
8.1.2
8.1.3
8.2
8.3
8.4
Summary of Findings.......................................................................... 225
Link-Attribute Approach ................................................................. 225
Event-Based Approach ................................................................... 227
Advantages and Drawbacks of the Two Approaches ....................... 227
Importance of the Study ..................................................................... 230
Limitations of the Study...................................................................... 231
Further Research Directions................................................................ 234
REFERENCES ............................................................................................................... 237
ix
List of Figures
Figure 1.1 Illustration on the primary focus of the thesis ......................................3
Figure 1.2 Illustration of hot spots and hot zones on a hypothetical road
network ........................................................................................................................5
Figure 1.3 Illustration of measuring spatial proximity in planar and network
2D spaces ....................................................................................................................11
Figure 1.4 Number of road traffic deaths (2001-2010) .........................................13
Figure 1.5 Number of serious injuries (2001-2010)...............................................13
Figure 1.6 Annual average number of deaths per million population
(2006-2010) ................................................................................................................15
Figure 1.7 Annual average number of deaths per million vehicle-km traveled
(2006-2010) ................................................................................................................15
Figure 1.8 Number of road crashes and casualties (2001-2010) ..........................16
Figure 3.1 Schematic diagram of the methodological framework ..........................42
Figure 3.2 Illustration of methodological framework ..........................................44
Figure 3.3 Road crashes plotted onto a map(Loo & Yao, 2012)...........................46
Figure 3.4 Illustration of raw-link-node and dissolved-road systems ................48
Figure 3.5 Flow chart of the dissolving algorithm ................................................50
Figure 3.6 A hypothetical road network structure (Loo & Yao, 2012)...............52
Figure 3.7 A hypothetical structure of seven BSUs (Loo & Yao, 2012) ..............57
Figure 3.8 Link-attribute hot zones ........................................................................58
Figure 3.9 Work flow for generating reference points .........................................63
Figure 3.10 Reference points on a hypothetical road network ...........................65
Figure 3.11 Mapping of hot zones ...........................................................................70
Figure 3.12 Three hypothetical road networks .....................................................71
Figure 3.13 Different crash patterns on three hypothetical roads ......................73
Figure 3.14 ATC Network ........................................................................................86
Figure 3.15 Land Use in 2006 ..................................................................................88
Figure 3.16 Socio-economic deprivation index by TPU in 2006.........................92
Figure 4.1 Histogram for road crashes in 2002-2004 ......................................... 101
Figure 4.2 Hot zones (HZ 18+) identified with (a) raw-link-node road system
only and (b) dissolved road system only in the period from 2002 to 2004 ..... 112
Figure 4.3 Part of hot zones identified with dissolved road system only........ 113
Figure 5.1 Numbers of road crashes and casualties during 2001 to 2010 ........ 128
Figure 5.2 Percentages of fatalities by road user type, 2001-2010 ................... 132
Figure 5.3 Percentages of fatal and seriously injured casualties by road user
type, 2001-2010 ...................................................................................................... 133
Figure 5.4 Population Density in 2006 ................................................................ 138
Figure 5.5 Road Network in Kwun Tong District .............................................. 139
xi
Figure 5.6 Hot zones identified only by (a) equal-weighing and (b)
cost-weighted method ........................................................................................... 158
Figure 5.7 FM-only and BM-only cost-weighted hot zones in (a) 2002-2004
and (b) 2005-2007 .................................................................................................. 164
Figure 6.1 Illustration of locations of undesirable hot zones ............................ 181
Figure 6.2 Hot zones identified by 99% significance level for (a) crash
frequency and (b) crash risk in the period from 2002 to 2004 ......................... 186
Figure 6.3 Hot zones identified by 99% significance level for (a) crash
frequency and (b) crash risk in the period from 2005 to 2007 ......................... 187
Figure 6.4 E-only hot zones .................................................................................. 193
Figure 6.5 Illustration of determining an attribute value for a reference point
.................................................................................................................................. 200
Figure 7.1 Hot zones of three types using 99% percentile in (a) 2002-2004 and
(b) 2005-2007 .......................................................................................................... 209
Figure 7.2 Locations of unweighted pedestrian hot zones of (a) L-only and (b)
E-only ...................................................................................................................... 220
Figure 7.3 Locations of cost-weighted E-only pedestrian hot zones ............... 221
Figure 7.4 Site Review using Google Street View (From the crossing of Fu Yan
Street and Wut Wah Street) ................................................................................. 222
xii
List of Tables
Table 3.1 Statistics on length of BSUs based on the raw-link-node road
network ......................................................................................................................48
Table 3.2 Statistics on length of BSUs based on the dissolved road network ....53
Table 3.3 Summary on hot zones for random and concentrated crash patterns
on the three hypothetical road networks by using 100 m as the BSU length and
RP interval .................................................................................................................78
Table 3.4 Summary on hot zones for random and concentrated crash patterns
on the three hypothetical road networks by length .............................................80
Table 3.5 Shares of road crashes on road centerlines from 2002 to 2007 ...........84
Table 4.1 Comparison of negative binomial models by predictor and
segmentation length .............................................................................................. 105
Table 4.2 Coefficients of predictors in negative binomial models ................... 106
Table 4.3 Negative binomial models for crash rate ............................................ 107
Table 4.4 Hot zones by type of road system (2002-2004 and 2005-2007)....... 109
Table 4.5 Hot zones identified by both raw-link-node and dissolved road
systems ..................................................................................................................... 111
Table 4.6 Variation of hot zones between two periods ..................................... 115
Table 4.7 Hot zones identified in both periods .................................................. 115
Table 4.8 Statistics on hot zones based on Monte Carlo Simulations .............. 117
Table 4.9 Hot zones belonged to both HZ18+ and HZ99.9% .................................. 117
Table 4.10 Variation of hot zones between two periods ................................... 118
Table 4.11 Hot zones identified by both periods ............................................... 118
Table 4.12 Hot zones by predictor and segmentation length in 2002-2004 and
2005-2007................................................................................................................ 120
Table 4.13Variation of hot zones by segmentation length ............................... 121
Table 4.14 Summary of 100-meter hot zones by model .................................... 122
Table 4.15 Hot zones identified by both periods ............................................... 124
Table 4.16 Comparison of Model-L and Model-LATJ hot zones by district... 125
Table 5.1 Road Traffic Casualty Statistics in Hong Kong by Road User Type,
2001-2010................................................................................................................ 131
Table 5.2 Casualty cost in developed countries using willingness-to-pay
approach .................................................................................................................. 136
Table 5.3 Statistics on length of BSU before and after dissolving performance
.................................................................................................................................. 139
Table 5.4 Numbers and Percentages of fatalities, serious and slight pedestrian
injuries ..................................................................................................................... 140
Table 5.5 Descriptive statistics on pedestrian casualties ................................... 141
Table 5.6 Percentiles of crash intensity for the unweighted analysis ............. 144
xiii
Table 5.7 Percentiles of crash intensity for the cost-weighted analysis .......... 144
Table 5.8 Link-attribute results on base negative binomial models for
pedestrian casualties with length as independent variable ............................... 146
Table 5.9 Link-attribute results on full negative binomial models for pedestrian
casualties with six independent variables ........................................................... 147
Table 5.10 Link-attribute results on full negative binomial models for
pedestrian casualties with four independent variables...................................... 147
Table 5.11 Link-attribute results on base negative binomial models for
pedestrian casualties by injury type ..................................................................... 149
Table 5.12 Link-attribute results on full negative binomial models for
pedestrian casualties by injury type ..................................................................... 151
Table 5.13 Percentiles of EB estimates for unweighted analysis ...................... 154
Table 5.14 Statistics on link-attribute pedestrian hot zones with threshold
value defined by the simple ranking method ..................................................... 157
Table 5.15 Link-attribute hot zones identified only by the unweighted or
cost-weighted method ........................................................................................... 157
Table 5.16 Variation of hot zones between two periods ................................... 160
Table 5.17 Characteristics of hot zones identified by incorporating surrounding
environment ........................................................................................................... 161
Table 5.18 Link-attribute hot zones identified only by the unweighted or the
cost-weighted method ........................................................................................... 163
Table 5.19 Link-attribute hot zones identified only by the base-mode or the
full-model approach............................................................................................... 164
Table 5.20 Variation of BM and FM hot zones between two periods ............. 165
Table 5.21 Numbers and percentages of BM and FM hot zones identified in
both periods ............................................................................................................ 166
Table 5.22 Statistics on link-attribute pedestrian hot zones with crash intensity
defined by the EB estimate and observed counts (simple ranking) ................. 167
Table 5.23 Link-attribute hot zones identified only by the EB or the OC
approach (simple ranking) .................................................................................... 168
Table 5.24 Variation of EB and OC hot zones between two periods (simple
ranking) ................................................................................................................... 169
Table 5.25 Statistics on link-attribute pedestrian hot zones with threshold
value defined by EB estimate and observed pedestrian casualty count (safety
potential) ................................................................................................................. 170
Table 5.26 Link-attribute hot zones identified only by the EB or the OC
approach (safety potential).................................................................................... 170
Table 6.1 Illustration of reference points and AADT information .................. 177
Table 6.2 Illustration of “Interval” variable ........................................................ 177
Table 6.3 Event-based hot zones with threshold values defined by an arbitrary
number .................................................................................................................... 180
xiv
Table 6.4 Statistics on event-based hot zones with crash intensity less than the
critical value............................................................................................................ 180
Table 6.5 Statistics on hot zones based on statistical definition ....................... 183
Table 6.6 Hot zones by district and type based on statistical definition ......... 188
Table 6.7 Variation of hot zones (CF and CR) between two periods .............. 190
Table 6.8 Characteristics of link-attribute and event-based arbitrary-number
hot zones (2002-2004) ........................................................................................... 192
Table 6.9 Comparison of link-attribute and event-based hot zones (HZ18+)... 193
Table 6.10 Statistics of link-attribute and event-based hot zones (HZ18+) for two
periods ..................................................................................................................... 194
Table 6.11 Statistics of link-attribute and event-based hot zones for two
periods by significance level ................................................................................. 196
Table 6.12 Statistics of link-attribute and event-based hot zones (incorporation
of traffic exposure) ................................................................................................. 198
Table 6.13 Statistics of link-attribute and event-based hot zones for two
periods ..................................................................................................................... 199
Table 7.1 Statistics on event-based pedestrian hot zones with threshold value
defined by the simple ranking method ............................................................... 207
Table 7.2 Event-based hot zones identified only by unweighted or
cost-weighted method (2002-2004) ..................................................................... 210
Table 7.3 Variation of hot zones between two periods ..................................... 212
Table 7.4 Numbers and percentages of hot zones identified in both periods . 212
Table 7.5 Statistics of link-attribute and event-based hot zones for two periods
(unweighted and cost-weighted) ......................................................................... 215
Table 7.6 Summary on 99% L-only and E-only hot zones in 2002-2004........ 219
xv
List of Abbreviations
2D: Two Dimensional
AADT: Annual Average Daily Traffic
ATC: Annual Traffic Census
BM: Base Model
BSU: Basic Spatial Unit
CF: Crash Frequency
CR: Crash Risk
CV: Coefficient of Variation
CW: Cost-weighted
EB: Empirical Bayes
FM: Full Model
GIS: Geographic Information Science
GPS: Global Positioning System
HS: Hot Spot
HSID: Hot Spot Identification
HZ: Hot Zone
HZID: Hot Zone Identification
IDHRL: Identification of Hazardous Road Locations
ISS: Injury Severity Scale
KDE: Kernel Density Estimation
KLINCS: K-Function Local Indicators of Network-Constrained Cluster
NKDE: Network Kernel Density Estimation
NNA: Nearest Neighbor Analysis
OC: Observed Counts
RP: Reference Point
SDI: Socio-Economic Deprivation Index
TPM: True Poisson Mean
TPU: Tertiary Planning Unit
TRADS: Traffic Road Accident Database
UW: Unweighted
VSL: Value of a Statistical Life
WHO: World Health Organization
WTP: Willingness to Pay
YTM: Yau Tsim Mong
xvii
CHAPTER 1
INTRODUCTION
The world’s first road traffic injury could date back to 1896 in New York
and the first death was caused by a car just a few months later in London
(Peden et al., 2004). Since then, the threat of road traffic crashes has been
spread out at a tremendous speed and traffic injuries have become a global
health problem (World Health Organization, 2004). According to World
Health Organization (WHO) Global Status Report on Road Safety (2009), the
number of road traffic deaths was estimated at about 1.2 million in each year
and the number of people injured reached as high as 50 million. While many
high-income countries have showed a stabilized or declined trend of traffic
fatality rate, the number of traffic injuries in most regions of the world has
continued to rise in recent decades (WHO, 2008a, 2008b, 2009). It has been
estimated that the traffic injuries would become the fifth leading cause of
death (WHO, 2008b, 2009). It has long been acknowledged that traffic deaths
or injuries do not happen by chance. The traffic events are not “accidents” but
“crashes”, which means that the risk can be understood and safety can thus be
improved (WHO, 2004). Although a number of road safety measures have
been carried out by many administrations in recent years, the statistics above
1
have demonstrated that traffic injuries are still a serious threat to people’s
heath. Hence, more joint efforts from multiple disciplines are necessary to
enhance safety on the roads.
In dealing with road safety problems, one may ask questions such as
“where are hazardous road locations?”, “which road locations need treatments?”
or “which places are more likely to be improved?”. The answers to these
questions rely much on one important research field on road safety, the
identification of hazardous road locations (IDHRL). Hot spot identification
(HSID) and hot zone identification (HZID) are two major types of IDHRL.
HSID, which is also known as blacksite or blackspot identification, takes
junctions or individual road segments as spatially independent units. The road
hazards identified by HSID approaches are a set of independent junctions or
road segments (hot spots). HZID, also named black zone identification, takes
the network contiguity of spatial units into consideration when detecting
spatial concentration. The hazardous road locations identified by HZID
approaches are a set of contiguous road spatial units (hot zones). To
geographers, both hot spots and hot zones are spatial clusters of road crashes.
Conceptually, there are two ways to represent road crashes. One refers to the
attribute-based type (“area-attribute approach” or “link-attribute approach”).
The road crashes are represented as an attribute value assigned to an area
(Aguero-Valverde & Jovanis, 2006; Blazquez & Celis, 2012; Chen, Lin & Loo,
2
2012; Erdogan, 2009; LaScala, Gerber & Gruenewald, 2000) or road links
(Black, 1991; Loo, 2009; Moons, Brijs & Wets, 2009b; Persaud et al., 2010;
Yamada & Thill, 2010). The other approach, often termed “event-based
approach”, is to consider the location of individual road crashes (events)
directly (Anderson, 2009; Eckley & Curtin, 2012; Plug, Xia & Caulfield, 2011;
Steenberghen, Aerts & Thomas, 2010; Yamada & Thill, 2007). Both
link-attribute and event-based approaches can be employed to identify
hazardous road locations. The primary focus of this thesis, as shown in Figure
1.1, is on the methodological challenges and issues related to these two
approaches for the identification of hot zones as hazardous road locations.
Figure 1.1 Illustration on the primary focus of the thesis
3
The first section of this chapter further explains the contextual
background with prior knowledge on the identification of hazardous road
location and the detection of spatial concentration of road crashes. The section
also presents the reason why Hong Kong is chosen as a study area. The
research aim and objectives are introduced in Section 1.2. Main research
questions are presented in Section 1.3 and significance of this research is
introduced in Section 1.4.
1.1 Research Background
1.1.1Identification of Hazardous Road Locations
Both traffic crash hot spots and hot zones can be regarded as hazardous
road locations. Figure 1.2 delineates hot spots and hot zones on a hypothetical
road network. Hot spots are individual road junctions, such as Hot Spot A or
segments like Hot Spot B in Figure 1.2(a). A hot zone is characterized by at
least two contiguous road spatial units, such as shown in Figure 1.2(b), Hot
zone B with two spatial segments and A consisting of five spatial units. While
both hot spot and hot zone methods have been employed by researchers to
identify hazardous road locations, the research focuses of these two methods
are quite different. This subsection will present a brief introduction on the
methodological focuses of these two types of approaches.
4
Figure 1.2 Illustration of hot spots and hot zones on a hypothetical road network
1.1.1.1 Hot spot identification
A road junction or a road segment is identified as a “hot” or “hazardous”
or “dangerous” road location when its actual crash intensity is greater than or
equal to the critical crash intensity. Most previous studies on hot spot
identification focus more on the ways to define critical crash intensity.
According to Cheng and Washington (2005) and Elvik (2006), there are three
definitions commonly used for the identification of crash hot spots, that is,
numerical, statistical and model-based definitions. Numerical definitions are
based on the number of road crashes, or the level of crash risks, or the score of
casualty severity. Although this kind of definitions lacks appeal to the
scientific community, it is preferred by most road safety administrations in the
5
world (Elvik, 2007; Loo, 2009) due mainly to its practical advantages for
administrations to execute and monitor. Statistical definitions rely on the
deviation of the observations from a “normal” or “expected” number of
comparison (similar) locations. Different crash intensity measures, notably
simple crash counts, weighted crash counts and crash rates (i.e. crashes per
population, road length, vehicles and traffic volume) are commonly used. A
road segment having an actual crash intensity measure (statistically)
significantly higher than the “expected” rate is considered dangerous. Finally,
model-based definitions are based on more sophisticated crash predication
models which take into account more “confounding” factors such as traffic
volume. Compared with simple numerical definitions, model-based definitions
have much appeal to scientific world, but they rely much on data and there is
no consensus on specific variables that are most suitable to be treated as
“confounding” factors. Nevertheless, researchers have identified several
variables which are commonly included in the model as “prior information”.
Such factors will be further reviewed in the second chapter.
1.1.1.2 Hot zone identification
Compared with the hot spot method, hot zone methodology is a relatively
new direction for the identification of hazardous road locations. Unlike hot
spot methodology which directly uses critical value to define hazardous road
6
locations, hot zone methodology identifies road hazards by comparing the
actual crash intensity with the threshold value in a spatial unit (such as a very
small road segment) as well as examining the spatial relationships among these
units. If there are at least two contiguous spatial units and each unit has actual
crash intensity greater than the threshold value, a hot zone is identified as a
hazardous road location. As this method stems from the idea of spatial
autocorrelation on a network (Black, 1991, 1992; Black & Thomas, 1998;
Flahaut, 2004; Flahaut et al., 2003; Loo, 2009; Peeters & Thomas, 2009), most
of previous research on hot zone identification concentrates on the network
autocorrelation and the representation of spatial relationships among spatial
units, for instance, using different weighting matrixes such as 0-1 contiguity
matrix or distance-decay matrix to quantify the spatial proximity. However, to
the identification of hazardous road locations, the definition of the threshold
value or the determination of spatial units also plays a very important role, but
they have received much less attention than the “spatial” dimension of the
approach. For instance, most studies only utilized the average of actual crash
intensity or an arbitrary numerical crash count as the threshold value. The
length of a spatial unit was generally defined as 1 hectometer, that is, 100
meters. Hence, there remain a number of methodological issues to be
investigated for the identification of crash hot zones which will be the major
research focus of this thesis.
7
1.1.2 Detection of Spatial Concentration of Road Crashes
As mentioned before, both hot spots and hot zones are spatial
concentrations of road crashes. In a geographical context, there are two major
types of representations for road crashes, namely the attribute-based and the
event-based types. The former can be further classified into two sub-types,
that is, area-attribute and link-attribute approaches. For the area-attribute
approaches, traffic crashes are always analyzed based on areas (polygons) such
as traffic zones, census tracts, districts, provinces and regions (Erdogan, 2009;
Huang, Abdel-Aty & Darwiche, 2010; Levine, Kim & Nitz, 1995; Quddus,
2008). Generally, the focus is to visualize and explain the variability of crash
intensities. While area-attribute approaches are rarely used for the
identification of specific road hazards due to its coarse spatial resolution,
link-attribute and event-based approaches are always employed to identify
hazardous road locations at a local scale. Hence, this subsection will mainly
introduce link-attribute and event-based approaches for detecting spatial
concentrations of road crashes.
1.1.2.1 Link-attribute method
For link-attribute approaches, road crashes are represented as an attribute
value, often the crash intensity such as crash count, weighted crash count or
crash rate, assigned to a link or road segment. In the detection of spatial
8
concentrations of road crashes, link-attribute approach regards links or road
segments with high attribute values as clusters. This method has been widely
adopted by researchers to identify both crash hot spots and crash hot zones. In
the identification of crash hot zones, the entire road network is segmented into
smaller road segments with equal length, which are usually termed “basic
spatial units (BSUs)”. A hot zone is identified when at least two contiguous
BSUs, each of which has attribute value greater than the threshold value
(Flahaut, 2004; Flahaut et al., 2003; Loo, 2009; Steenberghen et al., 2004).
However, most previous studies on the link-attribute approaches, as
mentioned in Section 1.1.1.2, focus on the investigation of spatial contiguity
among BSUs. The determination of the attribute and the definition of the
threshold, as well as the segmentation of road network have not been carefully
investigated in the academic society.
1.1.2.2 Event-based method
Event-based method identifies crash clusters by measuring the physical
concentration among events (road crashes) through the examination of spatial
proximity (Bailey, 2004; Fotheringham, Brubsdon & Charlton, 2000;
O'Sullivan & Unwin, 2003; Okabe, Okunuki & Shiode, 2006a; Waller &
Gotway, 2004). There are two ways to measure the proximity of road crashes
in a two-dimensional (2D) space, namely planar 2D and network 2D measures.
9
The former treats road crashes as a planar 2D phenomenon which allows
events to be located at any place, and the latter regards crashes as a
network-constrained phenomenon which restricts events only to the network.
In a planar 2D space, the spatial separation of events is calculated by the
Euclidean distance, whereas in a network 2D space, the spatial separation is
measured by network distance. Figure 1.3 illustrates the way in which spatial
separation is measured in planar and network 2D spaces. In order to detect
clustering tendency around Crash A, one may need to look for neighboring
road crashes around it within a certain distance, h. In a planar 2D space, the
search space of Crash A is a circle with radius equal to h (see Figure 1.3a). As
Crash B is located within the search space, in other words, the distance
between Crash A and B is less than h, Crash B can be regarded as a neighbor of
Crash A. In a network 2D space, however, the search space of Crash A is like a
tree, as colored in grey in Figure 1.3b. The network distance between Crash A
and B is greater than h. As Crash B is located beyond the search space, Crash B
cannot be regarded as a neighboring crash. This thesis treats road crashes as a
network 2D phenomenon as they are primarily located on the road network
which is a network 2D space.
10
Figure 1.3 Illustration of measuring spatial proximity in planar and network 2D spaces
While most point pattern analytical tools such as kernel density
estimation and K function were initially designed for planar 2D space, recent
years have witnessed a growing awareness of transforming these methods from
planar 2D space to network 2D space (Okabe, Okunuki & Shiode, 2006b;
Okabe, Yomono & Kitamura, 1995; Shiode, 2011; Xie & Yan, 2008; Yamada &
Thill, 2004; Yamada & Thill, 2007). These studies argued convincingly that
spatial statistics of network-constrained phenomenon need to be different
from spatial analysis methods designed for the planar space. While the
applications of these methods to traffic crashes generally aimed to describe the
global spatial pattern using “point process” tools at the initial stage (Okabe,
Okunuki & Shiode, 2006b; Yamada & Thill, 2004), more research efforts have
been paid to the identification of local crash hot spots in recent years. For
11
instance, Yamada and Thill (2007) applied the network-constrained local
K-function method to the detection of traffic crash hot spots with statistical
crash intensity as the cut-off value. While some attempts have been made to
the application of the network-constrained event-based method in detecting
crash hot spots, little research has been conducted on the application to the
identification of traffic crash hot zones. Taking the network-constrained local
K-function approach as an example, this thesis makes a novel attempt to
employ the network-constrained event-based method for the identification of
crash hot zones as hazardous road locations.
1.1.3 Road Safety in Hong Kong
This research chooses Hong Kong as a study area, mainly because its road
safety is still not satisfactory although it has been improved during the recent
ten years in terms of numbers of traffic fatalities and serious injuries. In the
period from 2001 to 2010, Hong Kong has witnessed a 32.4% reduction in the
number of road traffic deaths (see Figure 1.4) from 173 in 2001 to 117 in 2010
and a significant 38.6% drop in the number of road traffic serious injuries (see
Figure 1.5) from 3,517 in 2001 to 2,160 in 2010.
12
250
200
150
100
50
0
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2009
2010
Number of Road Traffic Deaths
Figure 1.4 Number of road traffic deaths (2001-2010)
4000
3200
2400
1600
800
0
2001
2002
2003
2004
2005
2006
2007
2008
Number of Serious Injuries
Figure 1.5 Number of serious injuries (2001-2010)
13
Loo et. al. (2007) reviewed the road safety strategy of Hong Kong by
comparing Hong Kong with some developed countries such as Sweden who
always has a favorable road safety record. Figure 1.6 and Figure 1.7 are the
updates of the records of those countries. In comparison with these countries,
Hong Kong has a relatively lower fatality rate. As shown in Figure 1.6, the
annual average number of deaths per one million people was 22.53 during the
period from 2006 to 2010, much smaller than Great Britain (41.80) and Sweden
(42.23), due partly to the fact that people in Hong Kong rely much on off-road
public transport, particularly metro. However, focusing on the risk in Figure
1.7, it can be observed that Hong Kong has a terrible record in the number of
fatalities per million vehicle-km traveled. During the period from 2006 to 2010,
the annual average number of deaths per million vehicle-km traveled reached
as high as 0.014. It was about three-fold of Sweden and Great Britain. In
addition, the overall number of reported road crashes and traffic injuries are
stabilized at about 15,000 and 20,000 respectively (see Figure 1.8). These
reflect that Hong Kong still has a long way to go for the road safety
improvement, although the government departments, Road Safety Council and
other stakeholders have already made some effective countermeasures to
improve traffic safety in the recent decade.
14
100.00
90.00
80.00
70.00
60.00
50.00
40.00
30.00
20.00
10.00
0.00
Hong
Kong
22.53
Australia California
70.05
94.51
Japan
47.92
New
Zealand
90.89
Sweden
GB
42.23
41.80
Figure 1.6 Annual average number of deaths per million population (2006-2010)
0.016
0.014
0.012
0.01
0.008
0.006
0.004
0.002
0
Hong
New
Australia California
Japan
Sweden
GB
Kong
Zealand
0.0138445 0.0064664 0.0088948 0.0074545 0.0090993 0.0049713 0.0048807
Figure 1.7 Annual average number of deaths per million vehicle-km traveled (2006-2010)
15
24,000
20,000
16,000
12,000
8,000
4,000
0
2001
2002
2003
2004
2005
Number of Road Crashes
2006
2007
2008
2009
2010
Number of Road Casualties
Figure 1.8 Number of road crashes and casualties (2001-2010)
In addition, an empirical road structure may reflect some characteristics
of road structure which might be neglected on a hypothetical road network.
The structure of road network in Hong Kong is complex, with 4.4% of
expressways, 13.6% of main roads, 76.9% of secondary roads and 5.1% of other
lower-order roads. In this light, Hong Kong is suitable to be chosen as a case
study of the identification of hazardous road locations.
1.2 Aim and Objectives
This study aims at providing a GIS-based study on the identification of
crash hot zones as hazardous road locations by using two network-constrained
methods derived from network analysis and local spatial analysis. The main
16
objectives of the study are:
1) to explore a methodological framework for hot zones identification;
2) to investigate the characteristics of the link-attribute approach for
detecting road hazards;
3) to examine the characteristics of the event-based approach for
identifying hazardous locations; and
4) to compare the link-attribute approach with the event-based approach
so as to get a better understanding of their advantages and disadvantages for
identifying crash hot zones as hazardous road locations.
1.3 Research Questions
Based on the objectives, this research centers upon the following five
major questions:
● How can we use the link-attribute method to identify hazardous road
locations?
●How sensitive are the results of hazardous road locations to changes in
the methods and threshold values when the link-attribute approach is applied?
●How can we apply the event-based approach to the identification of
17
hazardous road locations?
●How sensitive are the results of hazardous road locations to changes in
the methods and threshold values when the event-based approach is used?
●What are strengths and weaknesses of the link-attribute and event-based
methods for the identification of crash hot zones as hazardous road locations?
Although research efforts have been dedicated to the methodological
framework of identification of hazardous road locations with the link-attribute
approach (Loo, 2009), there still remain some issues such as segmentation
method of road network which are worthy of closer investigation. Hence, the
general steps for the link-attribute crash hot zone identification will be firstly
presented. As crash intensity can be measured in different manners and
threshold values can be determined by various methods, varying crash
intensity and threshold values can further examine the sensitivity and
robustness of the approach.
As little research has focused on the event-based identification of crash
hot zones, the way in which the event-based approach is used to identify crash
hot zones should be proposed before any further investigation. After figuring
out key procedures for the event-based identification, sensitivity analysis could
then be conducted to examine the validity of the event-based approach.
18
After having knowledge about the two approaches in identifying crash
hot zones, there comes the last research question emphasizing the relative
strengths and weaknesses of the link-attribute and event-based approaches,
which could shed lights on the identification of crash hot zones.
1.4 Significance of Study
Road safety is a public health problem. The costs of traffic fatalities and
injuries can be huge not only in terms of money, but also physical and mental
sufferings. The crash victims have to suffer a lot and their families may also
have to bear great burdens such as grief, medical and legal costs, and loss of
earnings. Road crashes also add extra burden to the whole society, such as
medical emergency services, hospitalization, traffic congestion, and even tax,
social welfare, and insurance systems in the long term, which may negatively
influence the productivity and competitiveness of a society (Tsui, 2006). These
costs can significantly be reduced if crash risk can be better understood and
countermeasures can be properly carried out and strictly enforced. As the
identification of hazardous road locations plays a key role in addressing road
safety problems, this study is worthwhile in reducing the costs brought by the
traffic road crashes.
Road safety research on the identification of hazardous road locations has
19
been performed for many decades and there is still no consensus of the best
method in identifying hazardous road locations. As mentioned earlier, hot
zone identification is a relatively new IDHR method. There still remain a
number of methodological issues in applying this method to the identification
of hazardous road locations. By further investigating the hot zone method, this
study can enrich the theoretical knowledge of the identification of hazardous
road locations and practically provide policy-makers with more information
on identifying road hazards.
Most of previous research on the identification of hazardous road
locations was conducted by engineers and geographers. While engineering
studies mainly focused on risk factors such as traffic volume and road junctions,
geographers were more specialized in investigating spatial pattern of road
crashes with spatial analytical tools such as K-function and Moran I indicator.
Fewer researchers have concentrated on both areas in the identification of
crash hot zones. As advocated by the WHO (2004), the improvement of the
road safety requires more concerted efforts from multiple disciplines. In this
light, this thesis is academically desirable since it enriches the multi-discipline
research of road safety by integrating the knowledge of identification of
hazardous road locations in road safety and spatial analysis in geography.
20
1.5 Definition of Terms
Road traffic crash
“Road safety is no accident!” (WHO, 2004). As the word “accident” has a
meaning of “happening by chance” which may imply that the traffic events
cannot be avoided, this study chooses to use the word “crash” instead of
“accident”. According to Transport Department of HKSAR (2001-2010), a road
traffic crash refers to an incident reported to the Police, involving personal
injury occurring on roads in the Territory, in which one or more vehicles are
involved. It can be a vehicle-vehicle collision, a vehicle-pedestrian collision
(pedestrian crash) or a vehicle-object/other collision. In this study, “road
crash”, “traffic crash” and “road traffic crash” are regarded as the same thing
and used interchangeably.
Casualty
A casualty is a person killed or injured in a road crash. There may be more
than one casualty in a traffic road crash (Transport Department, HKSAR).
Fatality
A fatality is a sustained injury causing death within 30 days of the road
crash (Transport Department, HKSAR).
21
Serious injury
Whether an injury is regarded as serious or slight differs by
administration. This research adopts the definition of Transport Department of
HKSAR that defines “serious injury” as an injury for which a person is
detained in hospital as an 'in-patient' for more than twelve hours. Injuries
causing death 30 or more days after the crash are also included in this category
(Transport Department, HKSAR).
Slight injury
An injury is regarded as slight if it has a minor character such as a sprain,
bruise or cut not judged to be severe, or slight shock requiring roadside
attention and detention in hospital is less than 12 hours, or not required
(Transport Department, HKSAR).
1.6 Organization of Thesis
This thesis consists of eight chapters. The introductory chapter first gives
a brief description on the research background. It also presents the aim and
objectives, research questions and the significance of the study. Chapter Two
covers the literature review. Chapter Three presents the methodological
framework. Chapter Four and Five focus on the link-attribute approach in
identifying hazardous road locations, and Chapter Six and Seven concentrate
22
on the event-based approach and comparisons of the link-attribute and
event-based results. The last chapter rounds up the findings of the entire study.
It also points out the limitations of the present study and provides insights for
further research.
23
CHAPTER 2
LITERATURE REVIEW
This chapter reviews two network-constrained methods in identifying
hazardous road locations and environmental factors contributing to road
crashes.
2.1 Link-Attribute Methods in Identifying Hazardous Road
Locations
In the literature, link-attribute methods in identifying road hazards can
be categorized into two types. One type refers to the hot spot methodology
which has been studied for a long history. Focusing on individual road
segments, researchers have developed various hotspot identification (HSID)
methods, ranging from simple ranking of crash counts to complex model-based
definitions such as Empirical Bayes (EB) and Full Bayes (FB) models. Most
administrations in the world employ the hot spot methodology to identify
hazardous road locations. Compared with the hot spot family, the other
category, known as the hot zone methodology, is rather new. It was first
applied to the identification of road hazards in Black (1991) by examining both
the crash rate and spatial relationship of road segments. During the recent two
decades, it has attracted growing attention due to its flexibility in many aspects
25
(Loo, 2009; Moons, Brijs & Wets, 2009a). This section reviews the two
link-attribute methods for the identification of dangerous road places.
2.1.1Hot Spot Methodology
A crash hot spot can be defined as any site (road segments, intersections,
interchanges, etc.) that has a higher number of road crashes. While a large
number of studies have focused on the development of various hotspot
identification methods (Aguero-Valverde & Jovanils, 2009; Hauer et al., 2002;
Hauer, Ng & Lovell, 1988; Li, Zhu & Sui; 2007; Miaou & Lord, 2003; Persaud,
Lyon & Nguyen, 1999; Song et al., 2006), recent efforts have been dedicated to
the evaluation of these hot spot identification approaches (Cheng &
Washington, 2008; Cheng & Washington, 2005; Montella, 2010). This
sub-section briefly reviews a set of commonly applied HSID methods and
quantitative evaluation criteria.
2.1.1.1 HSID methods
Elvik(2007) made a distinction between three definitions of road crash
hot spots: 1) numerical definitions; 2) statistical definitions; and 3)
model-based definitions.
Numerical definitions are preferred by many road safety administrations
in the world (Elvik, 2008; Elvik et al., 2009; Geurts, 2006; Guo & Sheng, 2009).
26
For instance, Norway defines a hazardous spot as any location with a length of
not more than 100 m where at least 4 injury crashes have been recorded in the
last 5 years (Elvik, 2008). By taking into consideration the traffic volume,
Australian criteria are based on both a crash count (3 or more similar injury
crashes within 3 years) and a crash risk (at least 0.8) (Elvik, 2008). In Flanders,
a site is defined as a hot spot when its score, calculated with severity of
casualty (fatal, seriously or slightly injured) considered, is no less than 15
(Geurts, 2006).
Statistical definitions rely on the deviation of the observed crash count
from a normal number of comparison (similar) locations. For instance, “a
location is identified as unsafe if the observed crash count exceeds the
observed average of counts of similar locations with a certain confidence level
(typically 0.90, 0.95, or 0.99 in practice)” (Cheng & Washington, 2005;
Laughlin et al., 1975). Statistical definitions may come close to model-based
definitions (Elvik, 2007) if the normal crash count of comparison sites is
estimated by a statistical model such as Poisson and negative binomial models
with a set of variables which may impact the incidence of road crashes.
Model-based definitions are based on crash predication models. A typical
example is the Empirical Bayes (EB) method, the essence of which is to smooth
out the random fluctuation by combining the crash history of a specific
27
location with an estimate of the expected number of crashes based on the crash
history of similar locations (Elvik, 2007). Since its first application by Abbess,
Jarret and Wright (1981), the EB approach has been well developed and widely
applied to road safety (Hauer, 1997; Hauer et al., 2002; Hauer, Ng & Lovell,
1988; Persaud, 1991; Persaud, Lyon & Nguyen, 1999). It has been widely
acknowledged that the EB model can perform well in differentiating the “true
positive” (high crash intensity due to real risk factors) and the “false positive”
(high crash intensity due to randomness) locations. More recently, full Bayes
(FB) methods have been employed for hot spot identification due to its explicit
use of probability for quantifying uncertainty (Huang, Chin & Haque, 2009;
Lan & Persaud, 2011; Miranda-Moreno & Fu, 2007).
2.1.1.2 HSID methods evaluation criteria
The performance of HSID methods have been assessed quantitatively with
various criteria. Persaud, Lyon and Nguyen (1999) measured the effectiveness
of EB methods by calculating the difference between the observed and
estimated crashes in a subsequent time period. By categorizing the sites into
correct positives, false positives, correct negatives and false negatives, Elvik
(2007, 2008) compared five HSID techniques in terms of two epidemiological
criteria, namely sensitivity by calculating the percentage of correct positives,
and specificity by computing the percentage of correct negatives. Higle and
28
Hecht (1989) evaluated HSID methods with the false identification statistic
which relies on the percentages of false positives and false negatives. Apart
from the false identification test, Cheng and Washington (2008) introduced
another four evaluation tests, including the site consistency test which
measures the ability of an HSID method to consistently identify a high-risk
sites over repeated observational periods, the method consistency test which
evaluates the performance of a HSID by calculating the number of the same
hot spots identified in both periods, the total rank difference test which
quantifies the reliability of a method by summing the rank differences of the
hazardous road sections identified across the two periods, and the Poisson
mean difference test which assesses the performance by calculating the sum of
the absolute difference of true Poisson means (TPMs) associated with the
falsely identified sites and critical TPMs. Montella (2010) gave an effectiveness
measure by means of a synthetic index which combines the site consistency
test, the method consistency test, and the total rank difference with the same
weight.
2.1.2Hot Zone Methodology
The hot zone methodology stems from the idea of spatial autocorrelation
on a network. A well-known spatial statistic for measuring spatial
autocorrelation is Moran I (Moran, 1948), which refers to the degree that the
29
value of a variable at a location covaries with values of that variable at nearby
locations. While the Moran I statistic was initially designed for planar 2D
space, Black (1991) first applied it to attributes on network segments. Black
(1992) further developed this approach and presented the concept of network
autocorrelation, which examines the extent to which a variable value on each
network segment correlates with those on other network segments that
connect to each segment. Black and Thomas (1998) illustrated the utility of
network autocorrelation by using network Moran I statistic to assess the
highway crash hotspot clustering tendency. While these pioneering studies
were concerned with network autocorrelation at a global spatial context,
Flahaut, et al (2003) introduced the local network autocorrelation indices
which were a decomposition of the global Moran I index. The approach
analyzes road networks by small segments known as basic spatial units. Based
upon the crash rates and spatial proximity of BSUs, crash hotspot clusters (that
is, the crash hot zones) are detected through analyzing the spatial association
between BSUs. The method was applied to the detection of crash hot zones in
Belgium. However, the study area of their case only contained a single
highway, which could hardly portray the characteristics of the road network
in the real world. This issue was later treated by Flahaut (2004) which
analyzed the crash pattern of a province of Wallonia in Belgium (Walloon
Brabant). A recent study by Loo (2009) targeted the entire city of Hong Kong
30
and performed the hot zone analysis by using a modified local Moran I statistic,
named Loo statistic hereafter. The study also solved the double-counting
problem by following a preset rule. In comparison with blacksite methodology
which was used by Hong Kong government, Loo (2009) found that the hot
zone methodology is superior and flexible in many aspects, especially in the
identification of road hazards on expressways and in the rural areas. Based on
the Loo statistic, Moons, Brijs and Wets (2009a) analyzed road crashes on
highways in a province of Belgium and in an urban environment. The results
indicated that incorporating the hot zone methodology in crash analysis could
provide more information on the underlying hazardous road locations.
Yamada and Thill (2010) identified crash hot zones by means of local Moran I
statistic and the local Getis and Ord G statistic. They also analyzed the spatial
pattern of crashes by taking into consideration the traffic exposure. Their
research could avoid the detection of hot zones that merely reflect the traffic
exposure.
Link-attribute hot zone methodology is based on BSUs which are derived
from the road segmentation with equal interval. Aggregating road crashes by
BSU can raise one of the most important issues in geography that the scale of
the spatial unit (length of a BSU) may influence the results (hot zones).
However, in the literature, the researchers generally chose one constant value
only to define the length of a BUS, such as 100 m (Black, 1991; Black &
31
Thomas, 1998; Flahaut, 2004; Flahaut et al., 2003; Loo, 2009; Moons, Brijs &
Wets, 2009a) or 0.1 mile (Yamada & Thill, 2010). Little research has been
directed towards the impact of BSU length on the hot zone identification. In
addition, most previous studies were based on crash counts of road links and
focused more on spatial statistical methods that are used to assess the spatial
association of links. They conducted crash analysis without considering much
on the choice of attributes that are attached to links. However, given the same
hot zone identification index such as Loo statistic, the hot zones identified on
the basis of crash counts may differ from those based on the number of
casualties involved. The selection of variables of links hence also plays an
important role in identifying hazardous road locations. Moreover, most of the
researchers only concentrated on observed crash pattern. However, from the
perspective of crash potential (observed minus expected), the factors that can
explain the spatial distribution of crashes need to be closely examined.
2.2 Event-Based Approaches for the Identification of Road
Crash Clusters
Event-based methods that are widely employed in crash pattern analysis
were initially developed based upon the assumption of a planar 2D space.
These measures can be classified into distance-based methods which examine
distances between events, and density-based methods that examine the crude
32
density or overall intensity of a point pattern (O'Sullivan & Unwin, 2003).
Frequently used distance-based methods include nearest-neighbor distance
and distance functions like the G, F, and K (O'Sullivan & Unwin, 2003). Of
these, Ripley's K-function has been utilized for crash analysis in many studies
(Jones, Langford & Bentham, 1996; Schneider, Ryznar & Khattak, 2004). Loo
and Tsui (2005) used the nearest neighbor analysis (NNA) to explore the crash
clustering tendency. The Pythagoras’s theorem is used to calculate the distance
of each crash to its nearest neighbour. The alternative to distance-based
methods is density-based measures such as quadrat count methods and density
estimation. Among these, the kernel-density estimation (KDE) is the most
widely used approach in analyzing crash patterns (Anderson, 2009; Delmelle &
Thill, 2008; Erdogan et al., 2008; Pulugurtha, Krishnakumar & Nambisan,
2007).
Although conventional event-based methods were originally developed
for a planar 2D space, recent attempts were made to extend these methods
from a planar 2D space to a network 2D space (Loo, Yao & Wu, 2011b; Okabe,
Okunuki & Shiode, 2006b; Okabe, Satoh & Sugihara, 2009; Xie & Yan, 2008;
Yamada & Thill, 2004; Yamada & Thill, 2007). Taking K-function as an
example of distance-based approaches, Okabe and Yamada (2001) developed a
K-function statistic on a network for measuring clustering tendency. Yamada
and Thill (2004) further compared the network K-function and the traditional
33
planar K-function methods to illustrate the risk of false positive detection
associated with the use of a statistic designed for a planar space to analyze a
network-constrained phenomenon. Their results clearly demonstrated that the
planar K-function analysis is problematic since it entails a significant chance of
over-detecting (Yamada and Thill, 2004). In order to identify crash clusters, in
a following study of Yamada and Thill (2007), they designed a method named
K-function local indicators of network-constrained clusters (KLINCS) and
identified highway vehicle crash hot spots in Buffalo of USA. For
density-based measures, Xie and Yan (2008) presented a novel network KDE
approach for estimating the density of network-constrained events and tested
this method with traffic crashes on a road network. The results indicated that
the network KDE is more appropriate than the standard planar KDE for traffic
crash “hot spot” analysis.
Previous network-constrained event-based methods for spatial analysis of
road crashes focused on either the measurement of clustering tendency at a
global level (Okabe & Yamada, 2001; Yamada & Thill, 2004) or the detection
of crash clusters (that is, crash hot spots) (Loo, Yao & Wu, 2011; Xie & Yan,
2008; Yamada & Thill, 2007). However, little research has been concentrated
upon the identification of clusters of road crash clusters (that is, crash hot
zones). In addition, like link-attribute methods, environmental factors need to
be taken into consideration while identifying hot zones with event-based
34
approaches. This study will attempt to bridge these gaps.
2.3 Environmental Factors Contributing to Road Crashes
Contributory factors to road crashes can be classified into three categories,
namely vehicular, human and environmental factors. This subsection briefly
reviews previous studies which examined the relationship between road
crashes and the environmental factors. Environmental factors are always
regarded as exogenous forces on road traffic crashes, including road
environment,
spatial
environment,
demographic
and
socio-economic
environment and natural environment.
2.3.1Road Environment
Road environment refers to the characteristics of roadways and sidewalks.
Many road safety engineers paid attention to exposure, such as vehicle
exposure (Abbas, 2004; Miaou, 1994; Pei, Wong & Sze, 2012; Qin et al., 2004,
2006; Van den Bossche, 2005) and pedestrian volume (Brüde & Larsson, 1993;
Gårder, 2004; Harwood, 2008; Leden, 2002; Lyon & Persaud, 2002). Some
researchers were also concerned with road types or road functions such as
expressway, arterial routes, collector roads or access roads (Hadayeghi, Shalaby
& Persaud, 2007; Ladron de Guevara, Washington & Oh, 2004 ), while some
were more interested in cross-section elements, such as the number of lanes,
35
lane width, shoulder type, presence of a median and median width
(Aguero-Valverde & Jovanils, 2009; Lee & Abdel-Aty, 2005; Pande &
Abdel-Aty, 2009; Pei, Wong & Sze, 2011). There were also a great number of
studies examining the relationship between traffic controls (e.g. speed limit,
access control, type of traffic control at intersections, etc.) and road crashes
(Abdel-Aty et al., 2008; Afukaar & Damsere-Derry 2010; Elvik, 2012; Pei,
Wong & Sze, 2011). Moreover, as the majority of road crashes occurred at or
within a short distance from road junctions, many scholars focused on
collisions around road intersections (Lee & Abdel-Aty, 2005; Ljung Aust,
Fagerlind & Sagber, 2011; Wong, Sze & Li, 2007).
2.3.2Spatial Environment
Spatial variations in road crashes have been pinpointed by many
researchers in the literature. Examples of such analyses include developing vs.
developed countries (Peden et al., 2004; Rasouli et al., 2008), national
comparison (Yang & Otte, 2007), provincial comparison (Hu et al., 2008), and
rural vs. urban areas (Kmet & Macarthur, 2006; Loo, Cheung & Yao, 2011).
Moreover, some other indicators such as signal density, population density,
junction density, road density, street network structure and size of build-up
areas are also regarded as spatial variables (Lovegrove & Sayed 2006; Marshall
& Garrick, 2011; Spoerri, Egger & Elm, 2011). The impacts of land use
36
variables on road crashes have long been examined (Dissanayake, Aryaija&
Wedagama, 2009; Graham and Glaister, 2003; Jegede, 1988; Kim, Brunner &
Yamashita, 2006; Wier et al., 2009). For instance, Jegede (1988) found a
positive association between road collisions and economic interaction
measured by the number of industrial establishments. Wier et al. (2009)
reported that neighborhood commercial and residential-neighborhood
commercial area positively affected the occurrence of vehicle-pedestrian
injury collisions. The reason, as discussed by Ben-Akiva and Bowman (1995), is
that land use is one of major factors in the generation or attraction of traffic.
Certain types of land uses are associated with particular human activities that
might increase the likelihood of road crashes (Tsui, 2006).
2.3.3Demographic and Socio-economic Environment
The relationship between demographic characteristics and road crashes
(or casualties) has been investigated by many scholars such as LaScala, Gerber
and Gruenewald (2000), Law, Noland and Evans (2009), Pulugurtha and
Sambhara (2011), and Spoerri, Egger and Elm (2011). A great number of
studies have also examined the relationship between socio-economic factors
and the incidence of crashes. Most of the studies focused on vehicle-pedestrian
collisions other than vehicle-vehicle crashes. They pointed out that the
socio-economic characteristics of the neighborhoods were significant
37
predictors of pedestrian crashes (Cottrill & Thakuriah 2010; LaScala, Gerber &
Gruenewald, 2000; LaScala, Gruenewald & Johnson, 2004; McMahon et al.,
1999). In general, crashes involving pedestrians are strongly associated with
communities with lower level of employment, more low-income households,
more old houses and less highly-educated residents. It was also found that area
deprivation had a significant impact on the number of pedestrian casualties
and the severity of pedestrian injuries (Graham, Glaister & Anderson, 2005;
Graham & Stephens, 2008; Green, Muir & Mahe, 2011).
2.3.4Natural Environment
Road collisions are more likely to occur in times of bad weather (e.g.
raining, snowing and low visibility) (Brijs, Karlis & Wets 2008; Keay &
Simmonds, 2005; Wang, Quddus & Ison, 2011; Yau, 2004), probably due to the
increase of careless road users and vehicle disorder.
This study will examine the effects of road, spatial, and demographic and
socio-economic environments on the spatial distribution of road crashes and
identify hazardous road locations by incorporating these factors.
2.4 Summary
This chapter reviews the link-attribute and event-based literature relating
to the identification of hazardous road locations. For link-attribute approaches,
38
although the focus of previous studies was on hot spot identification, some
researchers have paid efforts on hot zone detection in recent years. However,
there remained some methodological issues on link-attribute hot zone
methods such as environmental exposure. For event-based crash pattern
analyses, none of the existing studies are concerned with hot zone
identification. It is hence worthy of the examination of event-based hot zone
detection. Environmental variables that were studied in the literature to
explain the spatial distribution of road crashes are also reviewed in this chapter.
These environmental factors will be incorporated in the sensitivity analyses of
both link-attribute and event-based approaches.
39
CHAPTER 3
METHODOLOGY
This chapter provides a broad framework showing the methodology in
the identification of crash hot zones. The methodological framework is
presented in the first section. Next, the two network-constrained methods in
analyzing road crashes for hot zone identification are introduced in the
following section. The ways in which simulated and empirical data are
analyzed will be discussed in Section 3.3 and Section 3.4 respectively.
3.1 Methodological Framework
Figure 3.1 is a schematic diagram summarizing the methodological
framework. With the overriding goal of improving road safety, this research
intends to explore efficient and flexible approaches to the detection of
hazardous road locations. The two major datasets are simulated and empirical
data. The former are a set of hypothetical road networks with simulated road
crashes located on them. The latter describe road crash, road network system
and environmental characteristics of Hong Kong. The environmental database
includes road environmental data such as road type, number of road junctions,
traffic volume; socio-economic environmental data such as household income,
employment rate, education level, and owner occupancy; and other
40
environmental data such as population density, junction density, road density,
and land use characteristic in Hong Kong. The hot zone methodology is used
as the basis for both link-attribute and event-based methods. Loo’s statistic will
be employed to perform the link-attribute hot zone analysis, and a newly
developed index based on Local-K function will be used for the event-based
detection of hot zones. The two methods are applied to analysis of both
simulated and empirical data. A series of sensitivity analysis will be conducted
in order to investigate the characteristics of the two approaches in identifying
crash hot zones, such as the examination of the influence of the segmentation
method, type of crash patterns and definition of the threshold value on the hot
zones. In particular, by incorporating different environmental factors in
empirical data analysis, crash-based and casualty-based hot zones will be
identified in a crash-potential manner. The former will focus on road
environment such as traffic exposure, road type and number of road junctions
and the latter will be more concentrated on surrounding environment such as
characteristics of land use type and socio-economic status. The outputs of the
link-attribute and event-based hot zones, including maps and statistics will be
compared in order to get a better understanding of their strengths and
weaknesses for detecting hazardous road locations.
41
Figure 3.1 Schematic diagram of the methodological framework
42
Figure 3.2 is a diagram showing the detailed implementation of the
methodological framework for the identification of crash hot zones. The
following section will introduce general steps of link-attribute analysis based
on Loo’s statistic and event-based analysis based on a newly developed index
deprived from Local K-function. Section 3.3 presents the way in which the
two methods are applied to the analysis of hypothetical road network
structures with simulated road crashes. The analysis of the simulated data
focuses on the impacts of segmentation method (definition of reference points),
crash pattern and road network structure. Section 3.4 presents the application
of the two methods to the empirical data of Hong Kong. Both crashes and
casualties are analyzed by the two approaches. The sensitivity analysis focuses
on the segmentation method (definition of reference points), definition of
threshold value and casualty-weighted method. Details of the data collection
and approaches in analyzing the empirical data can be found in Section 3.4.1
and 3.4.2 respectively.
43
Figure 3.2 Illustration of methodological framework
44
3.2 Network-constrained Methods for Hot Zone
Identification
This section introduces general steps of link-attribute and event-based
methods in analyzing road crashes for the identification of crash hot zones.
3.2.1Link-attribute Method
As reviewed in Section 2.1.2, some spatial models such as Local Moran I,
Local Getis and Ord G statistic, and Loo’s statistic have been used for analyzing
road crashes. Compared with other models, Loo’s statistic provides a simple
and quick way of quantitatively measuring the degree of clustering. Hence,
this research chooses Loo’s statistic as the link-attribute approach for the
identification of crash hot zones. The general steps involve geo-validation of
road crashes, segmenting road network into BSUs, calculating actual crash
intensity, defining threshold value, modeling spatial pattern and mapping of
hot zones. The major steps were also reported in Loo (2009), Loo et al. (2011),
and Loo and Yao (2012).
3.2.1.1
Geo-validation of road crashes
High precision of locations of road crashes is vital to this research. As a
network 2D phenomenon, crashes should be constrained to road network.
However, for both technical and non-technical reasons, they are unlikely to
intersect with the centerlines of the road network (Loo, 2006). Figure 3.3
45
delineates a tiny part of the crash map in Hong Kong in the year 2007 (Loo and
Yao, 2012). It is obvious that most crashes recorded by the Hong Kong Police
were not located on the road network. Hence, these crashes need to be
snapped to the appropriate junctions or centerline of the road network. For
instance, one may use geo-validation procedures introduced in Loo (2006) to
move the crashes to appropriate locations.
Figure 3.3 Road crashes plotted onto a map(Loo & Yao, 2012)
3.2.1.2
Segmentation of the road network
Although there has been no clear indication of an optimal length for
defining a hazardous BSU, the researchers have suggested using a fixed value
46
which should be long enough to allow the identification of crash clusters but
short enough to reflect the variation in road environment (Loo, 2009). As
mentioned in Section 2.1.2, previous studies usually preferred 100 m or 0.1
mile as the segmentation interval. They chose one constant value only,
without examining the extent to which the length of the BSU influences the
hot zone identification. This study will conduct sensitivity analysis of hot
zones on segmentation length. Another issue is the violation of the “fixed”
condition. For an empirical link-node road system, the length of each BSU
cannot be fully standardized after the segmentation performance. The main
reason is that the “fixed length” condition is likely to be violated near end
nodes and result in some BSUs less than the predefined length. The situation is
even worse in an urban area with a dense road structure. Closely-located road
intersections which are topologically represented by nodes, may lead to a great
number of short links and thus contribute to more units much shorter than a
standard BSU. Taking Hong Kong as an example, there are altogether 6,445
links in the Annual Traffic Census (ATC) road database, with about 1,090 km
in total length. The average length of the link is around 170 m. Table 3.1
shows the statistics on lengths of BSUs after segmentation with 100 m interval.
There are totally 14,245 BSUs, of which 44.9% are shorter than 100 m. The
shares of the tiny BSUs with length less than 25 m and 50 m reach 11.2% and
24.1% respectively.
47
Table 3.1 Statistics on length of BSUs based on the raw-link-node road network
Length(m)
0<L<25
25<=L<50
50<=L<75
75<=L<100
L=100
Total
Number of BSUs
(%)
1,591
(11.2%)
1,839
(12.9%)
1,711
(12.0%)
1,257
(8.8%)
7,847
(55.1%)
14,245
(100%)
In order to reduce the number of short BSUs, road links are firstly
dissolved before segmentation. Figure 3.4 (a) describes a hypothetical
raw-link-node system with 19 links and 25 nodes. Each link connects to, that
is, has an end node in common with, at least one link. Figure 3.4 (b) delineates
the road structure after a dissolving performance. In this dissolved-road system,
there exists only 6 links with 12 nodes, which may result in a sharp decline of
short BSUs.
A
A
E
E
O
O
I
I
B
F
J
Q
C
G
K
P
P
M
R
Q
M
D
D
L
L
N
H
H
(a)
(b)
Figure 3.4 Illustration of raw-link-node and dissolved-road systems
48
N
R
Transforming from a raw-link-node to a dissolved road system involves a
dissolving algorithm (Loo & Yao, 2012). A key issue is that each link, as shown
in Figure 3.4 (a), always has more than one neighbour, but is only allowed to
dissolve one of them. To cope with this problem, a link may dissolve one of the
contiguous links by random selection or following a preset rule. A priority
sequence is designed in this study for determining the link to be dissolved. The
work flow of the algorithm is shown in Figure 3.5. For a link with two or more
contiguous sections, the algorithm first dissolves the neighbor with the same
road name. If there are no sections sharing the same name, the program will
choose a neighboring link according to a “tangent” rule.
49
Link i
N
Contiguous link?
Next link
Y
Y
Only one link?
N
Y
Link with the same road
name ?
N
“tangent” rule
Create tangent lines for each link at end nodes
For each tangent line of contiguous link,
calculate the angle with the tangent line of link i
Obtain the link with smallest angle
Dissolve
Figure 3.5 Flow chart of the dissolving algorithm
50
Taking the hypothetical raw link-node road system in Figure 3.6(a) as an
example, it consists of eight links, including link AB, BC, CD, DE, BG, BF, CH
and DI. If one starts from link AB, link BC will be dissolved by link AB due to
the same road name, although link BG and BF also connect to link AB at
common end node B. Next, the GIS algorithm looks for contiguous segments at
end node C. Following the same logic, link CD will be dissolved with link AC.
Then, a new round of dissolving work begins at point D. As neither link DE
nor link DI has the same road name with link CD, tangent line a, b and c are
created at end node D for link CD, DE and DI, respectively. The angle between
a and b is then compared with that between a and c. As the former is smaller
than the latter, link DE will be picked out as the merged segment. Figure 3.6(b)
depicts the road structure after the dissolving performance, which only
consists of four links, namely link AE, FG, CH and DI.
51
(a) Raw link-node road system
(b) Dissolved road system
Figure 3.6 A hypothetical road network structure (Loo & Yao, 2012)
52
After performing the dissolving algorithm on ATC road network of Hong
Kong, there are only 871 sections in the dissolved road database, with 1,250 m
long on average. Table 3.2 shows statistics on length of BSUs based on the
dissolved road network. If the road network is still segmented with 100 m
interval, 11,398 BSUs are obtained. Among them, only 4.4% are less than 50 m
long. By using the improved network segmentation method, the shares of
BSUs with length less than 50 m and 100m are dramatically reduced by 81.7%
(24.1% with raw road database and 4.4% with dissolved road network) and
49.0% (44.9% with raw road database and 22.9% with dissolved road network),
respectively. The extent to which the hot zones are sensitive to a road system
will be examined in the following analysis of this chapter.
Table 3.2 Statistics on length of BSUs based on the dissolved road network
Length(m)
0<L<25
25<=L<50
50<=L<75
75<=L<100
L=100
Total
Number of BSUs
(%)
230
(2.0%)
275
(2.4%)
252
(2.2%)
1,853
(16.3%)
8,788
(77.1%)
11,398
(100%)
3.2.1.3
Calculation of the actual crash intensity
The crash intensity can be defined by various approaches such as crash
frequency, crash risk or casualty-weighted scores. Taking crash frequency as
an example, the actual crash intensity on a BSU is defined by the number of
road crashes. However, as a large number of road crashes happen on junctions,
double-counting problem occurs frequently. In dealing with the issue, a preset
53
rule is suggested to ensure the “1:1” relation. For instance, the spatial database
records the location of each BSU by storing a set of coordinate pairs. With the
assistance of GIS, one can obtain the minimum and the maximum x and y
coordinates of each BSU. Based on these values, Loo (2009) allocated a
road-junction crash to the BSU with a smaller maximum x coordinate. This
study will follow this rule to tackle the double-counting problem.
3.2.1.4
Definition of the threshold value
There has been no obvious evidence of the best way to define the
threshold value of a BSU in hot zone identification. Widely used definitions in
previous research include numerical and statistical definitions. This study also
defines the threshold value from crash prediction models which take into
account more environmental factors.
Numerical definition
Numerical definitions such as an arbitrary crash count are widely used in
the identification of crash hot zones. Such type of definitions does not rely on
crash potential but overall crash frequency. For instance, Loo (2009) chose 3, 4,
and 5 crashes in one year as the threshold values for the identification of hot
zones in Hong Kong so as to be comparable with blacksite methodology which
is adopted by the HKSAR Government.
54
Statistical definition
A BSU is identified as hot if the observed road crash count exceeds the
expected number of road crashes on similar BSUs. Typical examples include
Loo et al. (2011), MoonsBrijs and Wets (2009a) and Yamada and Thill (2010)
who defined the threshold value by Monte Carlo Simulation of crash
distribution with an arbitrary significance level such as 95% and 99%. Black
and Thomas (1998), and Flahaut et al. (2003) used the mean value of crash
intensity as the threshold value. Statistical definitions could be similar to
model-based definitions if the expected crash count is estimated by a statistical
model such as negative binomial models.
Model-based definition
None of previous research has been dedicated to model-based definition
of threshold values. This study will apply this method to the identification of
crash hot zones. The threshold value will be determined by a crash prediction
model with a set of environmental indicators as independent variables. In this
way, the crash hot zones are identified in a “potential crash reduction”
manner.
3.2.1.5
Modeling of the spatial pattern
This study nests on the Loo statistic (Loo, 2009), I(HZ), which can be
defined as:
55
I ( HZ ) i  zi
n
W z
ij j
j 1, j i
( 3.1)
where n is the number of BSUs; i, j=1,2,…,n, W ij is a contiguity 0-1 matrix; z i is
a 0-1 indicator showing whether or not the BSU is hot. Here, the “hot” means
the BSU has actual crash intensity no less than the threshold value. z i can be
denoted by:
1 if LCIi  ti
zi  
,
0 otherwise;
(3.2)
where LCIi is the crash intensity at BSU i; t i is the threshold value of
BSUi ,which can be assigned any positive value.
Matrices are widely used in spatial analysis for representing spatial
concepts. This research concentrates on those contiguous BSUs with relatively
high risks. Thus, W ij is denoted as a contiguity 0-1 matrix whose elements are
only ones or zeros. For instance, following the hypothetical structure of seven
BSUs in Figure 3.7, the weight matrix, W, can be calculated as equation 3.3.
Six pairs, including BSU 1(AD) and 5 (BC), BSU 1(AD) and 2 (DG), BSU 2 (DG)
and 3 (GJ), BSU 3 (GJ) and 4 (JK), BSU 2 (DG) and 6 (EF), and BSU 3 (GJ) and
7(IH), are considered as contiguous. Hence, twelve elements, namely element
(1,5), (5,1), (1,2), (2,1), (2,3), (3,2), (3,4), (4,3), (2,6), (6,2), (3,7) and (7,3), are
assigned as “1” and others are set as “0”.
56
Figure 3.7 A hypothetical structure of seven BSUs (Loo & Yao, 2012)
*
1

0

W  0
1

0
0

1
*
1
0
0
1
0
0
1
*
1
0
0
1
0
0
1
*
0
0
0
1
0
0
0
*
0
0
0
1
0
0
0
*
0
0
0
1

0
0

0
*
(3.3)
The value of I(HZ)i can be zero or a positive integral number between 1 and
n-1. A zero value may indicate that the actual crash intensity is less than the
threshold value at BSU i, or that there are no hot BSUs in the vicinity even if
the BSU i itself has a particularly high crash intensity. And a positive value
means that the BSU i and at least one of its contiguous BSUs have their crash
57
intensity no less than their threshold values.
3.2.1.6
Mapping of hot zones
The identification of hot zones only focuses on BSUs with positive I(HZ).
These BSUs will be picked and plotted on the map for further analysis. Figure
3.8 is part of a map describing two hot zones. As shown in the figure, the
length of hot zones is not fixed. While hot zone I is comprised of two
contiguous BSUs, hot zone II consists of four. The scale of the hot zone not
only depends on the concentration of road crashes on one BSU, but also relies
on the clustering tendency of hot BSUs.
Figure 3.8 Link-attribute hot zones
58
3.2.2 Event-based Method
As mentioned in Chapter 2, no research has been conducted on the
identification of hot zones with the event-based method. Nonetheless,
researchers have employed some network event-based approaches to detect
crash hot spots. Kernel Density Estimation is commonly used for hot spot
detecting. Recent years have witnessed a growing concern of Network Kernel
Density Estimation (NKDE) for identifying hot spots of network-constrained
phenomenon (Loo, Yao & Wu, 2011b; Okabe & Sugihara, 2009; Xie & Yan,
2008). The density at locations i can be calculated by:
f (i) 
1 N
dij
Kern( )

Nb j 1
b
(3.4)
where f(i) is the density at location i; b is the bandwidth (searching distance) of
the NKDE (only events within b are used to estimate f(i)); d ij is the network
distance between location i and event j; N is the total number of events; Kern(.)
is a kernel function weighting the ratio between d ij and b. Using the kernel
function, the “distance decay effect” is considered, that is, the longer the
distance between an event and location i, the less that event is weighted for
calculating the overall density. Commonly used kernel functions include
Triangle, Quartic (biweight), Triweight and Gaussian (Waller & Gotway,
2004).
59
Unlike NKDE which uses a kernel function to measure the “distance
decay effects”, Network Local K-function method which was developed by
Yamada and Thill (2007) chooses a uniform function and assigns equal weights
to events within the searching distance. Traditional K-function investigates
clustering tendency of events by examining the extent to which events occur
within a distance of other events. However, for the identification of hazardous
road locations, as claimed by Yamada and Thill (2007), one is not interested in
crashes around which other crashes are concentrated, but is more concerned
with those road locations where crashes are clustered. In this light, Yamada
and Thill (2007) used reference points and defined the Network Local
K-function value at reference point i as:
n
LKi   fij
(3.5)
j 1
1 if dij  h
fij  
0 otherwise;
(3.6)
where LKi is the Network Local K-function value; n denotes the number of
crashes; d ij is the network distance between location i and event j; and h is the
search distance (bandwidth) from location i.
Although NKDE is more widely employed by researchers in the literature
of identification of crash hot spots, the event-based analysis of this research
60
will be built upon a network scan statistic similar with Yamada and Thill
(2007)’ s Network Locak K-function. The unweighted approach is chosen in
this thesis because in the link-attribute analysis, the road crashes are treated
equally when aggregated by road segments. In order to make a fair comparison,
the unweighted analysis is more appropriate than assigning distance-decay
weights.
The hot zone methodology for the event-based analysis of road crashes
involves geo-validating road crashes, generating reference points, calculating
actual crash intensity, defining threshold value and modeling spatial patterns.
As geo-validation performance has been introduced in Section 3.2.1, this sub
section will give a brief introduction on the following four processes.
3.2.2.1 Determination of reference points
As mentioned earlier, Local K-function methods detect clustering
tendency around road locations rather than events. As it is neither feasible nor
practical to actually investigate the clustering tendency for every possible
location along the road network, researchers always choose a set of points,
termed “reference points (RPs)” hereafter, and only calculate the clustering
tendency around these RPs (Xie & Yan, 2008; Yamada & Thill, 2007). In
Yamada and Thill (2007)’s work, the reference points were selected in a similar
manner with the GAM analysis (Openshaw et al., 1987) which lays a fine grid
61
over the study region. Xie and Yan (2008) also discussed this issue. Firstly, they
created a segment-based linear reference system out of the original road
network, with each segment being a line segment between two neighboring
road intersections. Next they divided each segment into basic linear units of a
defined network length and the center points of these linear units were
regarded as reference points.
This research determines the reference points following Yamada and Thill
(2007) and Xie and Yan (2008), but in a more “random” manner since the first
reference point of a link is not the “From” node of the link, but is randomly
determined along the link. Moreover, their research derived RPs based on the
raw link-node database, which might result in a great number of RPs with
interval less than the pre-set value. To reduce the undesirable effect, the road
network is also dissolved before RP generation by using the same dissolving
algorithm with the link-attribute approach. The work flow is presented in
Figure 3.9.
62
Figure 3.9 Work flow for generating reference points
The general steps for creating reference points are as follows:
1) Dissolve the raw road network to avoid too many short links by
following the dissolving algorithm in Section 3.2.1.2;
2) Generate a random value r;
3) Select a link;
4) Randomly select a node as the “From” node;
5) Starting at r from the node, place RPs with equal interval Int . One
63
needs to carefully consider the length of Int as it greatly influences the spatial
scale of hot zones.
6) Calculate the distance d between the last RP and the “To” node. If d is
more than the search distance, treat the “To” node as the last RP; and
7) Repeat Step 3 to 6 until all the links are examined.
Note that the value of r should be no more than half of Int. The reason
will be presented in the following subsection. Figure 3.10 (a) shows a
hypothetical road network after performing the dissolving algorithm.
Following Step (1) to (7), RPs can be generated and located, as shown in Figure
3.10 (b), along the road network. As shown in the figure, the distance between
the “starting” reference points (reference point 1, 2, 3, 4, and 5) and the “from”
end node of the link (node B, D, F, J, and G) equals to r. On each link (link BA,
DC, FE, JI, and GH), the interval between reference points equals to Int.
64
Figure 3.10 Reference points on a hypothetical road network
3.2.2.2 Calculation of crash intensity
The crash intensity can be defined by various methods such as crash
frequency, crash risk or casualty weighted approaches. Taking crash frequency
as an example, crash intensity ECI is calculated by:
n
ECIi   fij
(3.7)
j 1
1 if dij  h
fij  
0 otherwise;
(3.8)
where n denotes the number of crashes; d ij is the network distance between RP
i and event j; and h is the search distance from RP i. Built upon Yamada and
65
Thill (2007)s’ KLINCS approach, Equations 3.7 and 3.8 seem the same with
Equations 3.5 and 3.6. However, the KLINCS approach was originally designed
for investigating local crash clustering at various spatial scales with h no less
than the interval of reference points. When a whole empirical network is
considered, high h can result in a large overlap between neighboring search
space. As h can be conceptualized as the search “radius” anchored at the RP i,
h in Equation 3.8 is set equal to half of Int (interval of reference points). As
mentioned in the previous subsection, when deriving the first reference point
of a road link, a random value r is generated first and the value should be set
no more than half of Int. But precisely speaking, it should be no more than h
in Equation 3.8 so that every location of the road can be investigated.
3.2.2.3 Definition of threshold values
Although no research has been dedicated to the definition of threshold
values for the event-based hot zone identification, the way in which the
cut-off value is determined for event-based hot spot detection has been
discussed among point pattern analysts. The definitions can be categorized into
three types, namely numerical, simple ranking, and statistical definitions such
as a Monte Carlo method.
Numerical definition
By numerical definition, an arbitrary number is used to determine the
66
threshold value.
Simple ranking
A simple ranking method is to simply regard the percentile (90%, 95% or
99%) of largest values of observations as the cut-off value. For instance, Loo
and Yao (2011) identified hot spots of vehicle-pedestrian and vehicle-vehicle
crashes in Shanghai by using 99 percentiles of observed kernel density
estimates as the cut-off value.
Monte Carlo Simulation
More generally, an event-based hot spot is defined by statistical methods,
among which Monte Carlo method is the most widely used approach by
researchers. In the literature, there is no consensus on the definition of Monte
Carlo. For example, Ripley (1987) defines most probabilistic modeling as
stochastic simulation, with Monte Carlo being reserved for Monte Carlo
integration and Monte Carlo statistical tests. Sawilowsky et al (2003)
distinguishes between a simulation, a Monte Carlo method, and a Monte Carlo
simulation: a simulation is a fictitious representation of reality, a Monte Carlo
method is a technique that can be used to solve a mathematical or statistical
problem, and a Monte Carlo simulation uses repeated sampling to determine
the properties of some phenomenon (or behavior). In this research, “Monte
Carlo simulation” and “Monte Carlo method” are regarded as the same and
67
used interchangeably. Both of them refer to computational algorithms that rely
on repeated random sampling to obtain numerical results. For instance,
Yamada and Thill (2007) performed Monte Carlo approach to define the
threshold values of crash hot spots by randomly allocating the same number of
road crashes to the reference points and calculating the local K-function values
for simulated crashes. The simulation was repeated 1,000 times and 95%
significance level, that is, the 50 th largest simulated local K-function value, was
used as the threshold value.
3.2.2.4 Modeling of spatial pattern
An event-based indicator EI(HZ) is introduced to model the spatial pattern.
The hot zones are defined as:
EI ( HZ ) i  zi
m
f
z
( HZ ) ij j
j 1, j i
1 if ECIi  ti
zi  
0 otherwise;
1 if d ( HZ )ij  Int
f ( HZ )ij  
0 otherwise;
(3.9)
(3.10)
(3.11)
where z i indicates whether the RP can be regarded as “hot” ( z i =1); ECIi is the
crash intensity for the RP i, and t i is the threshold value at RP i. ECIi can be
68
calculated following Equations 3.7 and 3.8; d(HZ)ij is the network distance
between RP i and j; Int is the interval of reference points. The value of EI(HZ) is
also either positive or equal to zero. The identification of hot zones only
focuses on RPs with positive EI(HZ) (namely positive RP hereafter).
3.2.2.5 Mapping of hot zones
The search windows of positive RPs are generated in order to obtain hot
zones. As shown in Figure 3.11 (a), reference point A, B, C, D and E have
positive EI(HZ) values. The dark lines in Figure 3.11 (b) are search windows of
these positive RPs. Note that search windows may overlap among neighboring
positive RPs, especially when positive RPs are located around road junctions
and near end nodes of a road. A hot zone is comprised of at least two
neighboring search windows. In Figure 3.11 (b), one hot zone consists of
search windows A, B and C, and the other is composed of two search windows.
Like link-attribute hot zones, the length of event-based hot zones are not fixed,
but rather depends on the clustering tendency of hot RPs.
69
Figure 3.11 Mapping of hot zones
3.3 Data Analysis with Simulated Data
Although the primary goal of this research is to investigate the
link-attribute and event-based approaches for identifying crash hot zones in a
complex and empirical road environment, examining the results using
simplified hypothetical networks might gain some basic ideas of the
characteristics of the two approaches. Both link-attribute and event-based
approaches are applied to the analysis of simulated crash and road network
data. Sensitivity analysis will be conducted on road structure, crash pattern
70
and BSU length (RP interval).
3.3.1 Data Description
Figure 3.12 shows three hypothetical road networks, which could be
described as grid (with 24 road junctions), limited access (with 12 road
junctions) and organic (with 6 road junctions) network structures (Hummel,
2001).
Figure 3.12 Three hypothetical road networks
Each hypothetical road network is of 10 km long with 250 simulated
crashes. The total number of crashes is hypothetical but roughly based on the
situation of Hong Kong. Then, three spatial patterns of crashes are generated.
The dispersed pattern is generated by spacing the crashes at equal interval
throughout the network. A random pattern is generated by a random process.
A concentrated pattern is generated by following the concept of hot zones.
Crashes are firstly grouped into 10 crash hot zones over two BSUs. Then, these
underlying clusters (each with 25 crashes) are placed on the network with a
71
random start. Due to random variability problems, it is not appropriate to
compare the two approaches based on only one random or concentrated
pattern. In order to overcome part of the problem, random and concentrated
spatial patterns are generated ten times each for the three different
hypothetical road patterns. Figure 3.13 shows the dispersed pattern but just
one of the ten random and concentrated spatial patterns on the three
hypothetical road networks.
72
Figure 3.13 Different crash patterns on three hypothetical roads
73
3.3.2 Data Analysis
The Monte Carlo simulation approach is applied to the definition of
threshold values. The general steps of using Monte Carlo approach to identify
hot zones by link-attribute and event-based approaches are outlined in
Subsection 3.3.1 and 3.3.2 respectively.
3.3.2.1 Link-attribute analysis
The three hypothetical road networks are first segmented into k BSUs
with 100 meters by following the general rule. For each BSU, an integer (1, 2,
3, …,k) is assigned as unique ID. With each instance, the key steps for hot
zone identification are as follows:
(1) For the crash instance, calculate the number of road crashes on each
BSU as actual crash intensity.
(2) Randomly allocate 250 points to k BSUs and obtain the number of
points on each BSU, which is denoted as z(s im) .
(3) Repeat Step 2 1000 times;
(4) For each BSU, the 10 th (99% significance level) largest z(sim) is used as
the threshold value.
(5) Identify hot zones by following Equation 3.1 and 3.2.
In Step 3, the Monte Carlo simulation is repeated 1000 times.
Theoretically, the Monte Carlo results are more stable if a larger number of
74
replications are conducted. However, performing more simulations can
significantly increase the computational burden. Considering both precision
requirements and computing time, 1000 realizations are used in the Monte
Carlo procedure for a one-sided test at the 99% significance level.
In addition to 100 m, the BSU length is also determined as 50 m, 150 m
and 200 m in order to examine the sensitivity of hot zones to the length of
BSU.
3.3.2.2 Event-based analysis
For each hypothetical road network structure, k reference points are
derived with 100 m interval. With each instance, the key steps for hot zone
identification are as follows:
(1) For the crash instance, calculate ECI (Equation 3.7 and 3.8) at each RP
as actual crash intensity
(2) Randomly select 250 out of k reference points as one simulated road
crash pattern and calculate the crash intensity, denoted as ECI(sim), at each RP
according to Equations 3.7 and 3.8;
(3) Repeat Step 2 1000 times;
(4) For each reference point, the 10th (99% significance level) largest
ECI(sim) is used as the threshold value.
75
(5) Identify hot zones by following Equation 3.9 -3.11.
In addition to 100 m, the RP interval is also determined as 50 m, 150 m
and 200 m in order to examine the sensitivity of hot zones to the interval of
reference points.
3.3.2.3 Comparison of two approaches
The hot zones identified by the two approaches are compared. Descriptive
statistics on hot zones such as total number and length of hot zones are
calculated for each instance. By computing the mean and variation, sensitivity
is measured.
3.3.3 Results
Results for dispersed pattern show that both the link-attribute and
event-based approaches were performing very well with no hot zones detected
regardless of the BSU length and RP interval. Table 3.3 shows the number
(length) of hot zones identified for each simulation with random and
concentrated crash patterns when 100 m is used as the BSU length and interval
of reference points. The differences of the two approaches are more obvious
with the random spatial patterns, which should result in no hot zones
especially with more repeated iterations. It can be observed from Table 3.3
that the link-attribute approach does not identify any hot zone on limited
access and organic road networks in all instances, but the event-based
76
approach identifies one hot zone. The link-attribute approach detects one hot
zone with grid road network whereas the event-based approach identifies
three. The results suggest that the “false positive” problem can be more serious
with the event-based approach, especially with grid road network which is
characterized by more road intersections. Moreover, nearly all of the “hidden”
hot zones are identified for the concentrated patterns. The difference between
the two approaches is more significant with grid road network than that with
limited access or the organic road network. On average, the link-attribute
approach identifies 1.2 more hot zones than the event-based approach, but the
length of hot zones detected by the latter approach is 321 m longer than that
detected by the former on the grid road network. In addition, the
link-attribute approach’s performance is more stable with smaller coefficients
of variations (CV) for the grid road pattern in terms of both number and length
of hot zones. For the limited access road network, the link-attribute and
event-based approaches identify similar numbers of hot zones. For the organic
road pattern, its performance is about the same as the event-base approach but
the variability (as demonstrated by CV of the number and CV of the length of
hot zones) is higher.
77
Table 3.3 Summary on hot zones for random and concentrated crash patterns on the
three hypothetical road networks by using 100 m as the BSU length and RP interval
Random pattern
Grid
Limited Access
Organic
Link-attribute
Event-based
Link-attribute
Event-based
Link-attribute
Event-based
Simulation 1
0 (0)
0 (0)
0 (0)
0 (0)
0 (0)
1 (200)
Simulation 2
0 (0)
0 (0)
0 (0)
0 (0)
0 (0)
0 (0)
Simulation 3
0 (0)
0 (0)
0 (0)
0 (0)
0 (0)
0 (0)
Simulation 4
0 (0)
0 (0)
0 (0)
0 (0)
0 (0)
0 (0)
Simulation 5
0 (0)
0 (0)
0 (0)
0 (0)
0 (0)
0 (0)
Simulation 6
0 (0)
0 (0)
0 (0)
0 (0)
0 (0)
0 (0)
Simulation 7
1(200)
1 (224)
0 (0)
1 (200)
0 (0)
0 (0)
Simulation 8
0 (0)
1(275)
0 (0)
0 (0)
0 (0)
0 (0)
Simulation 9
0 (0)
1 (226)
0 (0)
0 (0)
0 (0)
0 (0)
Simulation 10
0 (0)
0 (0)
0 (0)
0 (0)
0 (0)
0 (0)
Mean
0.1(20.0)
0.3(72.5)
0 (0)
0.1 (20.0)
0 (0)
0.1 (20)
Standard
deviation
0.3 (63.3)
0.5(117.5)
0 (0)
0.3 (63.3)
0 (0)
0.3 (63.3)
Coefficient of
variation
3.16 (3.16)
1.61(1.62 )
0 (0)
3.16 (3.16)
0 (0)
3.16 (3.16)
Concentrated pattern
Grid
Limited Access
Organic
Link-attribute
Event-based
Link-attribute
Event-based
Link-attribute
Event-based
Simulation 1
7 (1900)
5 (1372)
9 (1800)
9 (2290)
8 (1800)
10 (2060)
Simulation 2
9 (2100)
8 (2522)
9 (2000)
8 (2260)
8 (1700)
9 (1920)
Simulation 3
9 (1900)
6 (1945)
9 (2100)
9 (2350)
9 (2000)
9 (2020)
Simulation 4
7 (1700)
8 (2412)
9 (2000)
8 (2430)
6 (1300)
9 (2130)
Simulation 5
7 (2000)
7 (2545)
8 (1900)
6 (2359)
10 (2000)
9 (2080)
Simulation 6
8 (2100)
6 (2242)
9 (1800)
9 (2218)
8 (1800)
9 (2040)
Simulation 7
8 (1700)
7 (1999)
9 (1900)
9 (2180)
8 (1600)
10 (2000)
Simulation 8
8 (1900)
6(2441)
9 (2000)
10 (2320)
10 (2100)
9 (2020)
Simulation 9
9 (2000)
7 (2115)
9 (2100)
8 (2450)
9 (1900)
9 (2160)
Simulation 10
7 (1500)
7 (2422)
9 (1900)
9 (2110)
10 (2000)
9 (2220)
Mean
7.9(1880.0 )
6.7(2201.5 )
8.8(1950.0)
8.5(2296.7)
8.6(1820.0)
9.2(2065.0)
Standard
deviation
0.88 (193.2 )
0.95(361.3 )
0.4(108.1)
1.08(107.8)
1.26(239.4)
0.42(86.6)
Coefficient of
variation
0.11 (0.10 )
0.14 (0.16 )
0.05(0.06)
0.13(0.05)
0.15(0.13)
0.05(0.04)
78
If one further examines the results of sensitivity analysis on BSU length
and interval of reference points, one can gain more insights on the two
approaches for the identification of hazardous road locations. Table 3.4
summarizes hot zones for random and concentrated crash patterns on the
three hypothetical road networks by interval. The results of sensitivity analysis
provide more evidence that the event-based approach is more likely to cause
“false positive” problem, especially with grid road network. For random crash
pattern, when the BSU length (interval of reference points) is defended as 50
meters, the link-attribute identifies 0.20 hot zones on average with grid road
network, but the event-based approach detects 0.40. If the BSU length and
interval of reference point are equal to 150 m and 200 m, the link-attribute
approach detects no hot zones on grid road network, but the event-based
approach identifies some. With the organic road network, both link-attribute
and event-based approaches identify no hot zones when 150 m and 200 m are
used as the segmentation length or reference point interval. If 50 m is applied,
both approaches can identify some hot zones, but the numbers and lengths are
almost the same. These findings suggest that the performances of the two
approaches are similar with the organic road structure.
79
Table 3.4 Summary on hot zones for random and concentrated crash patterns on the
three hypothetical road networks by length
Grid
Random pattern
Limited Access
L50
E50
L50
E50
Mean
0.20(20.0)
0.40(45.0)
0.10(10.0)
Standard
deviation
CV
0.42(42.16)
0.70(78.50)
2.11(2.11)
L100
Mean
Standard
deviation
CV
Mean
Standard
deviation
CV
Mean
Standard
deviation
CV
Organic
L50
E50
0.20(23.30)
0.20(20.0)
0.20(23.50)
0.32(31.62)
0.42(49.15)
0.42(42.16)
0.42(49.56)
1.75(1.74)
3.16(3.16)
2.11(2.11)
2.11(2.11)
2.11(2.11)
E100
L100
E100
L100
E100
0.10 (20.00)
0.30 (72.50)
0.00 (0.00)
0.10 (20.00)
0.00 (0.00)
0.10 (20.00)
0.32 (63.25)
0.48 (117.53)
0.00 (0.00)
0.32 (63.25)
0.00 (0.00)
0.32 (63.25)
3.16 (3.16)
1.61 (1.62)
0(0)
3.16 (3.16)
0(0)
3.16 (3.16)
L150
E150
L150
E150
L150
E150
0.00 (0.00)
0.80(304.8)
0.00 (0.00)
0.00 (0.00)
0.00 (0.00)
0.00 (0.00)
0.00(0.00)
0.79 (312.8)
0.00 (0.00)
0.00 (0.00)
0.00 (0.00)
0.00 (0.00)
0(0)
0.99 (1.03)
0(0)
0(0)
0(0)
0(0)
L200
E200
L200
E200
L200
E200
0.00 (0.00)
0.20 (99.8)
0.00 (0.00)
0.10 (53.8)
0.00 (0.00)
0.00 (0.00)
0.00 (0.00)
0.42 (210.4)
0.00 (0.00)
0.32 (170.1)
0.00 (0.00)
0.00 (0.00)
0(0)
2.11(2.11)
0(0)
3.16 (3.16)
0(0)
0(0)
Concentrated pattern
Mean
Standard
deviation
CV
Mean
Standard
deviation
CV
Mean
Standard
deviation
CV
Mean
Standard
deviation
CV
Grid
L50
E50
8.8(1775.0) 8.6(1908.7)
Limited Access
L50
E50
9.2(1765.0) 8.8(1878.2)
9.5(1885.0)
E50
9.2(2020.9)
0.63(67.7)
0.52(72.76)
0.42(97.3)
0.42(88.1)
0.53(81.8)
0.42(70.63)
0.07(0.04)
L100
0.06(0.04)
E100
0.05(0.06)
L100
0.05(0.05)
E100
0.06(0.04)
L100
0.05(0.03)
E100
7.9(1880.0 )
6.7(2201.5 )
8.8(1950.0)
8.5(2296.7)
8.6(1820.0)
9.2(2065.0)
0.88 (193.2 )
0.95(361.3 )
0.4(108.1)
1.08(107.8)
1.26(239.4)
0.42(86.6)
0.11 (0.10 )
0.14 (0.16 )
0.05(0.06)
0.13(0.05)
0.15(0.13)
0.05(0.04)
L150
E150
L150
E150
L150
E150
4.0(1585.0)
4.9(2771.0)
6.0(2025.0)
5.0(1916.1)
4.1(1290.0)
4.6(1489.3)
0.8 (413.0)
1.0(567.7)
1.3 (511.1)
1.1 (391.2)
2.3 (686.2)
1.3(337.3)
0.20(0.26)
0.20(0.20)
0.22 (0.25)
0.21(0.20)
0.56(0.53)
0.27 (0.23)
L200
E200
L200
E200
L200
E200
2.1(1100.0)
1.6(1755.9)
1.6(720.0)
3.0(1657.5)
2.3(1000.0)
3.2(1393.6)
1.1(543.7)
0.7(832.4)
0.7(379.5)
0.9(700.4)
1.6(666.7)
1.9(808.9)
0.52(0.49)
0.44(0.47)
0.44(0.53)
0.31(0.42)
0.68 (0.67)
0.60(0.58)
80
Organic
L50
When the hot zones are compared by BSU length (or RP interval), one
may observe (see Table 3.4) that the hot zones identified by either the
link-attribute or the event-based approach are sensitive to the BSU length or
the RP interval with the random crash pattern. For instance, on average, the
link-attribute approach detects 0.1 hot zones with grid road pattern when the
BUS length is defined as 100 meters, but identifies no hot zones when it is
defined as 150 meters or 200 meters. The average length of event-based hot
zones is 45.0 m with 50-meter RP interval, but reaches 304.8 m with
150-meter interval.
For the concentrated crash pattern, it is clear that the 50-meter approach
is the most stable no matter which approach is used and where the simulated
road crashes are located. The variability is increased with increasing
segmentation length (reference point interval) on the three types of road
system. Taking the link-attribute hot zones on the grid road network as an
example, the CV of the number (length) of 50-meter hot zones is 0.07(0.04)
whereas the figures are 0.11(0.10) with 100-meter, 0.20 (0.26) with 150-meter
hot zones, and 0.52 (0.49) with 200-meter hot zones. In this sense, a relatively
short segmentation length (reference point interval) might be preferable for
the identification of hazardous road locations with concentrated crash
patterns.
81
3.4 Data Analysis with Empirical Data
While the simulations on hypothetical road network in the previous
section are valuable in providing basic understanding of the characteristics of
the two approaches in a network-constrained environment, it is important to
explore the potential applicability of the hot-zone methodology in empirical
applications, which may eventually be adopted by road safety administrations.
The two approaches in analyzing empirical road crash patterns within a
complicated road network are introduced in this section. The empirical data
and analytical tools are presented briefly in Subsection 3.4.1 and 3.4.2
respectively. Details of the implementation and results of the analysis can be
found in the following four chapters.
3.4.1Data Collection and Editing
3.4.1.1 Road network database
For the network-constrained analysis, the road network database is of
prime importance. This study will use the link-node system for the centerline
of the road network in Hong Kong maintained and digitized by the Lands
Department. The total length of the road network in the database is about
4,249 km with 32,039 links and 26,692 nodes. The database contains
information on road names and road types. According to the Lands
Department (2004), roads of Hong Kong can be categorized into eight classes,
82
including: 1) expressway; 2) main road; 3) secondary road; 4) elevated road,
flyover and road bridge; 5) tunnel; 6) non-motorable track; 7) closed road; and
8) restricted access.
3.4.1.2 Traffic Road Accident Database (TRADS)
Another principal database for this research describes the police crash
investigation data known as the Traffic Road Accident Database (TRADS). The
TRADS system consists of three datasets, namely crash, casualty and vehicle
datasets, which provides a vast range of information on each crash. Generally,
this system records the location of crashes, road user type and the injury
severity classification of the casualties. Such information enables us to locate
casualties, find out crashes involving pedestrians and identify fatal and
seriously injured victims. As road crashes are rare events and randomness in
the number of traffic crashes happening at a certain location is typical, it is of
great importance that the study period can ensure representative crash samples
(Moons, Brijs & Wets, 2009a). Hence, following Mueller, Rivara and Bergman
(1988), this study pooled the datasets from 2002 to 2004 and 2005 to 2007 into
two three-year crash databases for analysis.
The crash database stores five-figure grid references which could be
transformed into projection coordinates. Each crash could be plotted onto a
digital map in a GIS environment based on the x and y coordinates. Table 3.5
83
shows the percentage of road crashes on centerlines in the raw crash database
from 2002 to 2007 in Hong Kong. Almost all crashes (no less than 99%) did not
intersect with the road links in the period from 2002 to 2004. In 2004, the
research on geo-validation (Loo, 2006) was shared with the Transport
Department and they improved the Traffic Information System (TIS)
accordingly. Hence, since 2005, the precision has been greatly improved.
Nonetheless, the spatial accuracy was still not high enough to locate all the
crashes on the right place. In the period from 2005 to 2007, there were still
more than 50% of road crashes beyond the centerlines. Hence, these crashes
need to be snapped to the appropriate junctions or the centerlines of the road
network. By following the geo-validation procedure proposed by Loo (2006),
all the locations of road crashes were geo-validated before analysis.
Table 3.5 Shares of road crashes on road centerlines from 2002 to 2007
2002
2003
2004
2005
2006
2007
Number of road crashes
15,576
14,436
15,026
15,062
14,849
15,315
On road centerline (%)
0.1
0.1
0.0
44.5
43.5
42.5
3.4.1.3 Annual traffic census
Traffic volume is very important in this study as it is highly related with
the occurrence of road crashes. The Annual average daily traffic (AADT) is
derived from the Hong Kong Annual Traffic Census provided by the Traffic
and Transport Survey Division of the Transport Department (2002-2007). The
84
data is generated through a range of counting stations across the Hong Kong
territory, where inductive loops and pneumatic tubes are installed on a
carriageway, and connected to the roadside automatic counters (Lee, 1989).
The counting stations (including ) are plotted onto a street map. Since not all
of the streets install the stations, only those with counting stations are
included in the identification of road crashes. In recent years, Hong Kong has
installed more and more stations. In the year of 2011, the census covered 1,813
km roads. However, since the study period of the analysis is from 2002 to 2007,
this research only included stations that existed in all of the six years.
Accordingly, the length of streets with AADT information of six years (Figure
3.14), named ATC road network hereafter, is about 1,060 km, accounting for
24.9% of the total length of the road network in the whole territory of Hong
Kong (4,249 km). The average AADT from 2002 to 2004 and from 2005-2007 is
used for the 2002-2004 period and the 2005-2007 period respectively.
85
Figure 3.14 ATC Network
3.4.1.4 Land utilization map
Due to data availability, land utilization data in 2004 and in 2006 are used
to characterize the land utilization during the period from 2002 to 2004 and
the period from 2005 to 2007 respectively.
The land use data of 2004 was derived from the digital topographic map
(B5000) produced by the "Map Publications Centre, Hong Kong" of Survey and
Mapping Office (2005). Coordinates of the map are in Hong Kong 1980 Grid.
The land use data of 2006 was extracted from a paper map on land
utilization of Hong Kong, which was compiled from the 2006 land use data of
86
the Planning Department and other relevant information including data
derived from SPOT Satellite images (2007). The transformation from a paper
map to a digitized vector map was performed by using ENVI which is the
premier software for processing and analyzing geospatial imagery. The key
steps involve:
(1) Scan the paper map to a digital map;
(2) Georeference the digital map with Hong Kong 1980 Grid coordinate
system;
(3) Select samples for each type of land use;
(4) Perform Supervised Classification;
(5) Vectorize the raster data to ArcGIS (a GIS software) shape files; and
(6) Manually check and correct the classification errors.
The land can be categorized by different land use categorization standard.
For instance, in the paper map on land utilization of Hong Kong in 2006, the
land use was classified into 19 categories. Based upon the literature on the
relationship between land use types and road crashes, this research broadly
divides the land use into five categories, namely residential, commercial,
industrial, institutional and other types for both 2004 and 2006 land use maps
(see Figure 3.15 as an example). These data will be used in casualty-based
analysis.
87
Figure 3.15 Land Use in 2006
88
3.4.1.5 Population Census
In Hong Kong, the Census and Statistics Department (2002, 2007)
conducts a population census every ten years and a by-census in the middle of
the intercensal period by tertiary planning unit (TPU). The TPU system is
devised by the Hong Kong Planning Department (2001, 2006) for town
planning and population census. The whole territory of Hong Kong (1,104
square kilometers) was divided into 276 TPUs in 2001 (Planning Department,
2001) and 282 in 2006 (Planning Department, 2006). The census report
presented the detailed characteristics of the population in each TPU, such as
age and sex of population, household size and composition, tenure of
accommodation, monthly domestic household income, highest education level
attended, economic activity status and occupation. In this study, the 2001
census (Census and Statistics Department, 2002) and 2006 by-census (Census
and Statistics Department, 2007) reports are collected to extract a series of
indicators describing the socio-economic conditions of each TPU for the
2002-2004 and 2005-2007 periods respectively.
The socio-economic characteristic is represented by an index named
socio-economic deprivation index (SDI), which is derived from a set of social
and economic variables. In this study, the variables are chosen based upon the
previous literature on area deprivation, including income, owner-occupancy,
89
education, occupation and unemployment which are defined as follows:
1) Income: Monthly household income <6000 HKD (%);
2) Owner-occupancy: Not owner-occupied households (%);
3) Education: Low upper-secondary education attainment (%);
4) Occupation: Occupation with no or low qualification (%); and
5) Unemployment: Unemployment (%).
The variables are similar to those chosen by other social deprivation
indices such as the Index of Local Conditions widely used in other countries
(Payne, Payne & Hyde, 1996). In this study, “monthly household income
<6000 HKD (%)” is a proxy for “income support recipients”, “not
owner-occupied households (%)” is the same with “not owner occupancy rate”,
“low upper-secondary education attainment (%)” is broadly similar to
“secondary school absence rate”, “occupation with no or low qualification (%)”
is a dimension similar to “Low social class”, and “unemployment (%)” is similar
to “total unemployment rate” (Payne, Payne & Hyde, 1996). Depending on
these five predictors, the socio-economic deprivation index, SDI, is calculated
by means of Z-scores, which is given as:
90
m
SDIi  WjZij
j 1
Zij 
3.12
Vij j

3.13
j
where SDIi is the socio-economic deprivation score for the ith TPU,
i=1,2,3,…,n, n is the count of TPUs, m is the number of variables, V ij is the
actual value of jth variable for TPU i, j is the mean value of variable j, σj is
the standard deviation of jth indictor, and Wj is the weight attached to
Z-scores. While some previous research used equal weights, this study gives
different weightings to each indicator variable. There are strong arguments in
favor of using differentiated weightings (Abdalla et al., 1997). For instance, it is
not appropriate to equate the impact of unemployment as a deprivation
predictor with a low upper-secondary education attainment rate. Compared
with unemployment rate, the education level is not such a direct measure and
its existence alone does not mean that the area is material deprived. Hence, the
weights used in this research were produced by the first principal-component
analysis, in a similar way as Abdalla et al. (1997). Details of the calculation of
area deprivation level can also be found in (Loo & Yao, 2010).
It should be pointed out that in order to ensure confidentiality of data
relating to individual person, household or quarters, each TPU with less than
1,000 persons is merged with adjacent TPU(s) in census reports (Census and
91
Statistics Department, 2002, 2007). Nevertheless, the report also provides
population of these merged TPU(s), which allows deriving information for
each of merged TPUs by a population-weighted approach. Figure 3.16
delineates the socio-economic condition of each TPU in 2006. The deprivation
level varied from 8.450 (most deprived) to -5.878 (most wealthy). These data
will be used in casualty-based analysis as an indicator describing the
socio-economic environment.
Figure 3.16 Socio-economic deprivation index by TPU in 2006
92
3.4.2Data Analysis
3.4.2.1 Link-attribute analysis for road crashes
The road crashes happening on ATC road network are analyzed at the
territory level in the periods of 2002-2004 and 2005-2007. The sensitivity
analysis is conducted on the segmentation method (topological structure of
road network and segmentation length) and the definition of threshold values
(numerical and statistical definitions). In particular, the hot zones are analyzed
with threshold values defined by incorporating road environmental variables
such as AADT, road type, segment length and number of road junctions.
3.4.2.2 Link-attribute analysis for road casualties
Casualty-weighted
analysis
(unweighted
and
cost-weighted)
on
pedestrian casualties in a selected district is performed at the district level. A
simple ranking method is used to determine the threshold value. In addition,
surrounding environmental variables affecting the occurrence of pedestrian
casualties are investigated. For each BSU, a range of TPU-based or link-based
variables are extracted, such as land use, road density, junction density,
population density and socio-deprivation level of the neighborhood. Hot zones
for pedestrians are detected by incorporating surrounding environmental
factors. Taking into consideration the regression-to-the-mean problem, the EB
method is employed to define the crash intensity. Sensitivity analysis is then
93
performed on the definition of crash intensity.
3.4.2.3 Event-based analysis for road crashes
The road crashes happening on ATC road network are analyzed at the
territory level. Numerical and Monte Carlo methods are used for definition of
threshold values. The influences of selection of reference points and road
environmental factors on hot zones are examined by conducting a series of
sensitivity analyses.
3.4.2.4 Event-based analysis for road casualties
Casualty-weighted
(unweighted
and
cost-weighted)
analysis
on
pedestrian casualties in a selected district will be conducted. The threshold
value is defined by a simple ranking method.
3.4.2.5 Comparison of two approaches
The hot zones identified by the two approaches are compared. Descriptive
statistics on hot zones of the entire region such as total number and length of
hot zones or total number of hot BSUs (or RPs) involved are calculated for
each map. By computing the mean and variation of these indicators, sensitivity
can be measured. In addition, hot zone maps of the two approaches will be
overlaid in order to derive hot zones identified by only one approach and those
detected by both methods. With these analyses, the similarities and differences
94
of the two methods in identifying crash hot zones are identified.
3.5 Summary
This chapter introduces the methodology for the identification of crash
hot zones by using link-attribute and event-based approaches. The two
approaches are first applied to the analysis of simulated road crashes on
hypothetical road network. The results suggest that the link-attribute
approach may detect less false positive hot zones, especially with grid road
structure. On an organic road network, the performances of the two
approaches are similar. No matter which type of road structure is used,
decreasing segmentation length (reference point interval) may reduce the
variability. In addition, this chapter describes the way in which the empirical
data are collected and edited. Analytical tools are also introduced briefly. The
following chapters will elaborate more on the implementation of the two
approaches for empirical data analysis.
95
CHAPTER 4
LINK-ATTRIBUTE ANALYSIS FOR ROAD
CRASHES
This chapter performs link-attribute analysis for road crashes in the whole
territory of Hong Kong. The data and analytical tools are presented in details
in Section 4.1. The focuses of the analysis are on the segmentation method and
definition of threshold values. The results will be discussed in Section 4.2.
4.1 Territory-Wide Identification of Hazardous Road
Locations
4.1.1 Data Description
The ATC road networks (1,060 km) of 2002-2004 and 2005-2007 are used
for territory–wide analysis in this chapter. There were 31,324 and 30,511 road
crashes happening on ATC roads during the period from 2002 to 2004 and the
period from 2005 to 2007 respectively. Following previous research, the
segmentation interval is first defined as 100 meters in this chapter. After
segmentation, 14,245 and 11,398 BSUs were obtained for the raw-link and
dissolved ATC road networks respectively.
97
4.1.2 Data Analysis
4.1.2.1
Numerical definition
There has been no official numerical definition for the threshold value for
hot zone identification. Loo (2009) defined the threshold values as 3, 4 and 5
road crashes in one year based upon the official definition of hot spots in Hong
Kong. As this study pools three-year data together, the threshold values will be
defined as 9, 12 and 15 road crashes in three years accordingly. Following
Equation 3.1 and 3.2, the hot zones are identified with three threshold values.
In particular, sensitivity analysis of hot zones to road system (raw link-node vs.
dissolved) will be conducted.
4.1.2.2
Monte Carlo method
Suppose that there are m road crashes in one period, the general steps of
Monte Carlo method include:
(1) For the observed set of road crashes, calculate the number of crashes
on each BSU as actual crash intensity.
(2) Randomly allocate m simulated crashes to BSUs and obtain the
number of simulated crashes on each BSU, which is denoted as z(sim).
(3) Repeat Step 2 1000 times;
(4) For each BSU, the 10 th (99% significance level) largest z(sim) is used as
98
the threshold value.
(5) Identify hot zones by following Equation 3.1 and 3.2.
To investigate the sensitivity of hot zones to the threshold value, the
significance level in Step 4 is also defined as 95% and 99.9%.
4.1.2.3
Incorporation of road environmental variables
The analysis aims to identify road hazards that do not only reflect the risk
factors such as traffic volume, road type and road junctions. It defines a BSU as
hot when the recorded number of road crashes exceeds the normal level of
safety. In this research, the “normal value” which is denoted as the threshold
value “t”, is estimated by a statistical model with a set of environmental
variables. The idea is borrowed from McGuigan (1981) who named the
difference between observed road crash counts and expected number of road
crashes as “potential crash reduction”. It should be pointed out that as this
method aims to identify hazardous road locations with potential for safety
improvement, the deviation of crash intensity (observed crash counts) and the
threshold value of a BSU should be larger than zero if the BSU is identified as
being hazardous. Hence, Z i in Equation 3.1 will be modified as:
1 if LCIi  ti
zi  
0 otherwise;
,
(4.1)
where LCIi is the crash intensity at BSU i; t i is the threshold value of
99
BSUi ,which is estimated by statistical models. A number of statistical models
have been used to estimate the crash frequency at a specific location over a
period of time. Classical models include Poisson and negative binomial
regression models.
Poisson regression, considered as “law of rare events”, is used to model
count data and contingency tables (Cameron & Trivedi, 1998). It was
employed in early attempts to investigate the relationship between road
crashes and risk factors (Blower, Campbell & Green, 1993; Joshua & Garber,
1990; Saccomanno & Buyco, 1988). However, the distribution of road crashes
in the real world cannot meet the requirement of the Poisson Model that the
sample mean should be equal to the sample variance (equidispersion). Taking
crash intensity of 2002-2004 as an example (see the histogram chart in Figure
4.1), the sample mean (0.89) was much smaller than the sample variance (22.8).
To deal with the problem, negative binomial model has been widely adopted
by researchers (Abdel-Aty & Radwan, 2000; Miaou, 1994; Tunaru, 1999) since
it is quite useful for discrete data over an unbounded positive range whose
sample variance exceeds the sample mean (overdispersion) (Cameron &
Trivedi, 1998; Hilbe, 2011). Hence, the following analysis will use negative
binomial regression models to calculate the mean number of road crashes as
the threshold value on each BSU, which is denoted by:
100
ln( i )   ' xi  
4.2
where ln(μ i) is the expected natural log of crash counts, x i is a vector of
predictors and β’ are the estimated coefficients. μ i is taken as the threshold
value denoted as ti in Equation 4.1.
Figure 4.1 Histogram for road crashes in 2002-2004
101
In order to examine the sensitivity of hot zones to the segmentation
length, the dissolved road network is also segmented with 150 m and 200 m
intervals. The threshold values are statistically defined by Model L which
includes length of BSU as the only independent variable, Model LA which
introduces length of BSU together with AADT (annul average daily traffic
from traffic census, as described in Section 3.4.1.3), Model LAT which
incorporates length of BSU, AADT and type of the road as explanatory
variables, and Model LATJ which is established with predictive factors
including length of BSU, AADT, type of the road and the number of road
junctions on BSU. Note that although using a dissolved road system can
significantly reduce the number of short BSUs, there are some BSUs not having
the standardized length. For instance, when the segmentation length is defined
as 100 meters, the short BSUs account for around 23%. Hence, the length of
BSU is still regarded as an independent variable in this research. The variable
“type of road” records the road type information of BSU. As mentioned earlier,
this research is based on the road centerline system devised by Lands
Department who categorizes the roads of Hong Kong into eight classes. This
study focuses on three major types of roads, namely expressways (EX), main
roads (MA) and secondary roads (SE), and treats other five types as one
category which is named “other roads” hereafter. With “other roads” as base
case for the “road type” variable, EX, MA, SE are three dummy variables for
102
the road type predictor.
The full log likelihood and Akaike's Information Criterion (AIC) for each
model are presented in Table 4.1. The two statistics are always used to compare
negative binomial models. Better fitted models have larger full log likelihood
value and smaller AIC value. For both periods, the models are better fitted
with increasing segmentation length. Taking Model Ls as an example, the full
log likelihood value is increased from -23,728 to -15,914 in the period from
2002 to 2004 and from -23,923 to -15,941 in the period of 2005-2007; and the
AIC value is decreased from 47,462 to 31,835 in the period of 2002-2004 and
from 47,852 to 31,889 in the period of 2005-2007. This is probably because the
shares of BSUs with zero road crashes are significantly reduced when the
segmentation length is raised. The goodness of fit is also improved when more
environmental factors are introduced into the models, indicating that all of
these variables can partly explain the spatial distribution of road crashes. In
particular, Model LATJs are much better fitted, implying that junctions may
have profound impacts on the spatial distribution of road crashes. The
“dispersion parameter (alpha)” and “the likelihood ratio test of alpha=0” are
also presented in Table 4.1. A Poisson model is one in which the alpha value is
constrained to zero. The likelihood ratio test that alpha equals to zero is a test
used to compare the model to a Poisson model. The alpha value is more than
zero and the p-value for the likelihood ratio test is less than 0.001 in all of the
103
models. This strongly suggests that alpha is non-zero and the negative
binomial model is more appropriate than the Poisson model. If one focuses on
coefficients of predictors presented in Table 4.2 for each model, all the
independent variables are significantly and positively associated with the
number of road crashes in both periods. More specifically, increasing AADT
can increase the chance of occurrence of road crashes; compared with
expressways and other roads, crashes are more likely to happen on main and
secondary roads; and road junctions would encourage the incidence of road
crashes.
104
Table 4.1 Comparison of negative binomial models by predictor and segmentation length
Predictor: length of BSU
Model 100L
Model 150L
Model 200L
02-04
05-07
02-04
05-07
02-04
05-07
Log likelihood
AIC
-23728
-23923
-18848.
-18932
-15914
-15941
47462
47852
37703
37870
31835
31889
alpha
2.18
1.82
1.851
1.589
1.650
1.458
Likelihood ratio
test of alpha=0
3.4e+04**
2.7e+04**
3.1e+04**
2.6e+04**
2.9e+04**
2.5e+04**
Predictor: length of BSU, AADT
Model 100LA
Model 150 LA
Model 200 LA
02-04
05-07
02-04
05-07
02-04
05-07
Log likelihood
-23659
-23845
-18791
-18869
15869
-15892
AIC
47327
47699
37560
37747
31747
31792
alpha
2.14
1.78
1.813
1.553
1.618
1.427
Likelihood ratio
test of alpha=0
3.3e+04**
2.6e+04**
3.0e+04**
2.5e+04**
2.8e+04**
2.4e+04**
Predictor: length of BSU, AADT, road type
Model 100 LAT
Model 150 LAT
Model 200 LAT
02-04
05-07
02-04
05-07
02-04
05-07
Log likelihood
-23348
-23460
-18535
-18557
-15649
-15627
AIC
46710
46935
37084
37128
31311
31269
alpha
1.96
1.60
1.654
1.386
1.471
1.270
Likelihood ratio
test of alpha=0
3.0e+04**
2.4e+04**
2.7e+04**
2.2e+04**
2.5e+04**
2.1e+04**
Predictor: length of BSU, AADT, road type, number of road junctions
Model 100 LATJ
Model 150 LATJ
Model 200 LATJ
02-04
05-07
02-04
05-07
02-04
05-07
Log likelihood
-22579
-22771
-18022
-18063
-15243
-15234
AIC
45175
45558
36060
36142
30503
30484
alpha
1.56
1.30
1.36
1.14
1.22
1.05
Likelihood ratio
test of alpha=0
2.4e+04**
1.9e+04**
2.2e+04**
1.7e+04**
2.0e+04**
1.7e+04**
Notes:
AIC: Akaike's Information Criterion ;alpha: dispersion parameter; *: p<0.05; **: p<0.001
105
Table 4.2 Coefficients of predictors in negative binomial models
Predictor: length of BSU
Model 100L
Model 150L
Model 200L
02-04
05-07
02-04
05-07
02-04
05-07
(Intercept)
-.023
.030
.075
.178*
.238*
.347**
Length
.011**
.010**
.009**
.008**
.008**
.007**
Predictor: length of BSU, AADT
Model 100LA
Model 150 LA
Model 200 LA
02-04
05-07
02-04
05-07
02-04
05-07
(Intercept)
-.155
-.094
-.069
.050
.108
.226*
Length
.010**
.009**
.009**
.008**
.007**
.007**
AADT
.006**
.006**
.006**
.006**
.006**
.006**
Predictor: length of BSU, AADT, road type
Model 100 LAT
02-04
(Intercept)
-1.843
**
Model 150 LAT
05-07
-1.861
**
02-04
-1.703
**
Model 200 LAT
05-07
-1.652
**
02-04
-1.550
**
05-07
-1.531**
Length
.010**
.009**
.009**
.008**
.007**
.007**
AADT
.015**
.014**
.014**
.014**
.014**
.014**
EXa
.438**
.547**
.408**
.486**
.389**
.506**
MAb
1.599**
1.696**
1.564**
1.646**
1.553**
1.653**
SEc
1.594**
1.686**
1.549**
1.618**
1.557**
1.648**
Predictor: length of BSU, AADT, road type, number of road junctions
Model 100 LATJ
Model 150 LATJ
Model 200 LATJ
02-04
05-07
02-04
05-07
02-04
05-07
(Intercept)
-1.943**
-1.943**
-1.672**
-1.610**
-1.472**
-1.450**
Length
.009**
.008**
.007**
.006**
.006**
.005**
AADT
.015**
.014**
.014**
.014**
.014**
.014**
EXa
.470**
.603**
.485**
.562**
.480**
.595**
MAb
1.236**
1.388**
1.213**
1.327**
1.212**
1.338**
SEc
1.156**
1.312**
1.117**
1.235**
1.128**
1.264**
Junction
.801**
.686**
.579**
.517**
.459**
.419**
Notes:
a:1 if expressway, 0 otherwise; b:1 if main road, 0 otherwise; c:1 if secondary road, 0 otherwise.
*
: p<0.05; **: p<0.001
106
Negative binomial regression is also appropriate for rate data, where the
rate is a count of events occurring to a particular unit of observation, divided
by some measure of that unit's exposure (Cameron & Trivedi, 1998). This is
handled as an offset, where the exposure variable enters on the right-hand side
of the equation, but with a parameter estimate (for log (exposure)) constrained
to 1(Cameron & Trivedi, 1998). Treating the traffic volume as the exposure,
this study also applied the negative binomial regression to the modeling of the
crash rate by introducing log (length * AADT) as the offset variable. The
results shown in Table 4.3 suggest that the road crashes are better modeled by
negative binomial regression models rather than Poisson models (see alpha and
likelihood ratio test of alpha=0). Although Model LA’s, as suggested by full log
likelihood value and AIC value, are not as fit as Model LAs in Table 4.2, the
models are also used for computing the threshold values in order to investigate
the sensitivity of hot zones.
Table 4.3 Negative binomial models for crash rate
Model 100LA’
02-04
05-07
Model 150 LA’
02-04
05-07
Model 200 LA’
02-04
05-07
(Intercept)
-6.23 **
-6.26 **
-6.22 **
-6.24**
-6.20 **
-6.22 **
Log likelihood
-24443
-24800
-19390
-19611
-16387
-16504
AIC
48890
47953
38784
39227
32778
33013
alpha
2.62
2.28
2.25
1.98
1.99
1.82
Likelihood ratio
test of alpha=0
4.8e+04**
4.1e+04**
4.5e+04**
4.0e+04**
4.3e+04**
3.9e+04**
Notes:
AIC: Akaike's Information Criterion ;alpha: dispersion parameter; *: p<0.05; **: p<0.001
107
The estimated numbers of road cashes which are used as threshold values
for hot zone identification, are then calculated by the established negative
binomial regression models, including Model Ls, LAs, LATs, LATJs and LA’s.
4.2 Results
4.2.1 Numerical Definition
The results on the sensitivity analysis of link-attribute hot zone to the
type of road system in the periods of 2002-2004 and 2005-2007 are
summarized in Table 4.4. The table also lists the computing time to identify
hot zones on one personal computer with Intel® Core™ 2 Duo CPU and 2 GB
RAM, including the computing time for geo-validation of road crashes (three
years), segmentation of road network, calculating crash intensity and modeling
of spatial pattern. The dissolved road system also included the computing time
spent performing the dissolving algorithm. It is observed that if the threshold
values of 9 crashes (HZ18+), 12 crashes (HZ24+) and 15 crashes (HZ30+) in three
years are used, the average numbers of raw-link-node-based hot zones
identified in the two periods are approximately 100, 36, and 16, with 19.5 km,
6.7 km and 2.8 km in total length. If the dissolved-based hot zones are
examined, the hot zones detected are far more than those identified based on
the raw-link-node road system in both periods. For instance, the number of
dissolved-based HZ18+ was 143 in the period from 2002-2004 with 6,482 road
108
crashes happening on them, which was 34 with 2,572 road crashes more than
raw-link-node hot zones. The total length was even doubled that based on
raw-link-node system. The gap became much wider when the threshold value
is increased. Similar observations can be made in the period of 2005-2007,
indicating that using dissolved road system can identify more hazardous road
locations, due largely to the reduction of short BSUs that are not long enough
to allow the identification of crash clusters.
Table 4.4 Hot zones by type of road system (2002-2004 and 2005-2007)
Raw-link-node road system
2002-2004
HZ type
2005-2007
HZ18+
HZ 24+
HZ 30+
HZ18+
HZ 24+
HZ 30+
Number of HZs
109
37
16
91
35
16
Number of BSUs
259
86
34
228
84
38
Length
20.72
6.72
2.7
18.19
6.77
3.09
No. of Road Crashes
3,910
1,638
774
3,408
1,542
813
Computing time (h.)
40
40
40
40
40
40
Dissolved road system
2002-2004
HZ type
2005-2007
HZ18+
HZ 24+
HZ 30+
HZ18+
HZ 24+
HZ30+
Number of HZs
143
73
36
144
75
38
Number of BSUs
423
191
92
417
177
88
Length
41.98
19.03
9.2
41.14
17.61
8.8
No. of Road Crashes
6,482
3,597
2,084
6,265
3,359
1,980
Computing time (h.)
45
45
45
45
45
45
To explore whether the hazardous road locations identified with the
raw-link-node road system are also detected by the dissolved road network,
the two types of hot zones are overlaid with different threshold values and
study periods. The hot zones identified by both road systems are labeled “HZR
109
(raw-link-node-based hot zones)-cum-HZD (dissolved-based hot zones)”. Table
4.5 shows the number of HZ R-cum-HZDs as well as the percentages of HZRs
and HZDs in the HZR-cum-HZD category. The shares of HZR18+-cum-HZD18+ in
HZR18+ were 70.6% in the period of 2002-2004 and 81.3% in the period from
2005 to 2007. In other words, about 70% in 2002-2004 and 82% in 2005-2007
of HZR18+ were also identified as hazardous road locations when the dissolved
road system is used. However, as the threshold value increases, the percentages
of HZR in the HZR-cum-HZD category were decreased. In the period of
2005-2007, the shares declined from 81.3% to 62.5%. Nevertheless, the
observations still suggest that a large part (more than 50%) of hot zones
identified with the raw-link-node road system can also be detected by the
dissolved road system. Figure 4.2 shows the spatial distribution of HZ 18+
identified only by the raw-link-node system (Figure 4.2a) and by the dissolved
road system (Figure 4.2b). It can be observed that the dissolved road system
can identify more hot zones in urban area (Kowloon Peninsula and Hong Kong
Island). Looking closely into these hot zones (see Figure 3.7), one can find that
most of the only-dissolved hot zones were located on dense roads, around road
junctions.
110
Table 4.5 Hot zones identified by both raw-link-node and dissolved road systems
2002-2004
HZR18+-cum- HZD18+
HZR24+-cum- HZD24+
70
70.6%
HZR30+-cum- HZD30+
24
39.4%
66.7%
10
32.9%
62.5%
27.8%
2005-2007
HZR18+-cum- HZD18+
HZR24+-cum- HZD24+
HZR30+-cum- HZD30+
72
26
10
81.3%
48.2%
85.7%
40%
62.5%
26.3%
Notes:
1. Percentages of HZR in the HZR-cum-HZD category are typed in italics
2. Percentages of HZD in the HZR-cum-HZD category are underlined
111
(a)
(b)
Figure 4.2 Hot zones (HZ18+) identified with (a) raw-link-node road system only and (b) dissolved
road system only in the period from 2002 to 2004
112
Figure 4.3 Part of hot zones identified with dissolved road system only
If the road sections are identified as road hazards in the period from 2002
to 2004, they are more likely to be detected as hazardous road locations in the
following three years if the identification method is robust. In order to
examine the stability, the hot zones of two periods are firstly compared in
terms of the total number and length of the hot zones, the number of BSUs and
the overall crash counts. The results summarized in Table 4.6 show that the
dissolved road system is generally more stable than the raw link-node road
system (see CV in the table) especially when the threshold value is assigned a
relatively small value. To further examine the robustness of the two methods,
the two types of hot zones are overlaid and the results are shown in Table 3.9.
113
It is observed from the hot zones identified in the period of 2002-2004 (HZ24)
and in the period of 2005-2007 (HZ57) that performance of the
dissolved-road-based approach is more stable than that of the raw-link-node
method regardless of the threshold values. Using HZ24 18+-cum-HZ5718+ as an
example, less than 40% of HZ2418+ and HZ5718+ were compatible when the
raw-link-node road system was used. When the dissolved system was
employed, the percentages of HZ24 30+ and HZ5730+ in the HZ24-cum-HZ57
category reached about 60%. As using the dissolved road system may detect
less false negatives and false positives, the link-attribute analysis is better
conducted with a dissolved road system. Hence, the following analysis will be
performed based on the dissolved road network.
114
Table 4.6 Variation of hot zones between two periods
Raw-link-node Road System
HZ
type
HZ18+
HZ24+
HZ30+
Mean
Standard
deviation
CV
Mean
Standard
deviation
CV
Mean
Standard
deviation
CV
No. of
HZs
100.00
12.73
0.13
36.00
1.41
0.04
16.00
0.00
0.00
No. of
BSUs
243.50
21.92
0.09
85.00
1.41
0.02
36.00
2.83
0.08
Length
of HZs
19.46
1.79
0.09
6.75
0.04
0.01
2.90
0.28
0.10
No. of
Crashes
3659.00
354.97
0.10
1590.00
67.88
0.04
793.50
27.58
0.03
Dissolved Road System
HZ
type
HZ18+
HZ24+
HZ30+
Mean
Standard
deviation
CV
Mean
Standard
deviation
CV
Mean
Standard
deviation
CV
No. of
HZs
143.50
0.71
0.00
74.00
1.41
0.02
37.00
1.41
0.04
No. of
BSUs
420.00
4.24
0.01
184.00
9.90
0.05
90.00
2.83
0.03
Length
of HZs
41.56
0.59
0.01
18.32
1.00
0.05
9.00
0.28
0.03
No. of
Crashes
6373.50
153.44
0.02
3478.00
163.29
0.04
2032.00
70.54
0.03
Table 4.7 Hot zones identified in both periods
Raw-link-node road system
HZ2418+-cum- HZ5718+
HZ2424+-cum- HZ5724+
43
39.4%
HZ2430+-cum- HZ5730+
10
30.1%
27.2%
4
28.6%
25.5%
25.5%
Dissolved road system
HZ2418+-cum- HZ5718+
HZ2424+-cum- HZ5724+
86
60.1%
36
59.7%
49.3%
48.2%
HZ2430+-cum- HZ5730+
19
52.8%
Notes:
1. Percentages of HZ24 in the HZ24-cum-HZ57 category are typed in italics
2. Percentages of HZ57 in the HZ24-cum-HZ57 category are underlined
115
50%
4.2.2 Monte Carlo Simulation
Table 4.8 summarizes the results on Monte Carlo Simulation. The table
also lists the computing time to identify hot zones on one personal computer
with Intel® Core™ 2 Duo CPU and 2 GB RAM, including the computing time
for geo-validation process, generating BSUs, calculating crash intensity,
performing 1,000 simulations and modeling of the spatial pattern. Compared
with the arbitrary-number definition, the Monte Carlo Simulation spent much
more time completing the whole procedure. The average computing time of
the arbitrary-number method was roughly 45 hours (see Table 4.4), but the
Monte Carlo method cost around 90 hours. When the 99.9% significance level
was used, the numbers of hot zones (HZ 99.9%) were 125 in 2002-2004 and 123 in
2005-2007, which were similar with HZ 18+ (143 in 2002-2004 and 144 in
2005-2007). To further compare the two approaches, the two types of hot
zones are overlaid and the hot zones identified by both methods are
summarized in Table 4.9. The results suggest that the two types of hot zones
were compatible in the sense that the shares of both HZ18+ and HZ99.9% hot
zones in the HZ18+-cum-HZ99.9% category were quite large.
116
Table 4.8 Statistics on hot zones based on Monte Carlo Simulations
2002-2004
2005-2007
HZ95%
HZ99%
HZ99.9%
HZ95%
247
HZ99%
184
HZ99.9%
123
No. of HZs
244
185
125
No. of BSUs
802
565
355
819
563
363
Length of HZs
79.04
55.93
35.24
80.25
55.30
35.85
No. of Crashes
10,160
8,000
5,702
10,080
7,811
5,686
90
90
90
90
90
90
Computing time (h.)
Table 4.9 Hot zones belonged to both HZ18+ and HZ99.9%
2002-2004
123
86.39%
2005-2007
122
98.4%
86.49%
99.19%
Notes:
1. Percentages of HZ18+ in the HZ18+-cum-HZ99.9% category are typed in italics
2. Percentages of HZ99.9% in the HZ18+-cum-HZ99.9% category are underlined
To test the stability of the performances of the two approaches, the
variation of the two types of hot zones between the period from 2002 to 2004
and the period from 2005 to 2007 are compared. As shown in Table 4.11, the
CVs of HZ18+ are smaller than those of HZ 99.9% in terms of the numbers of hot
zones and BSUs, but greater in terms of the total length and the number of
road crashes happening on hot zones. If the hot zones are overlaid, the shares
of HZ18+ in the HZ24-cum-HZ57 category are slightly greater than those of
HZ99.9%.
117
Table 4.10 Variation of hot zones between two periods
HZ type
HZ18+
HZ99.9%
Mean
Standard
deviation
CV
Mean
Standard
deviation
CV
No. of HZs
143.50
0.71
0.005
124.00
1.41
0.011
No. of BSUs
420.00
4.24
0.010
359.00
5.66
0.016
Length of
HZs
41.56
0.59
0.014
35.55
0.43
0.012
6373.50
153.44
0.024
5694.00
11.31
0.002
No. of
Crashes
Table 4.11 Hot zones identified by both periods
HZ18+
86
60.1%
HZ99.9%
72
59.7%
57.6%
58.5%
Notes:
1. Percentages of HZ24 in the HZ24-cum-HZ57 category are typed in italics
2. Percentages of HZ57 in the HZ24-cum-HZ57 category are underlined
In summary, the hot zone identified by the Monte Carlo simulation
method can be compatible with those detected by the arbitrary-number
definition. One may choose Monte Carlo Simulation method to identify hot
zones so as to avoid the use of an arbitrary number in defining threshold
values, but the arbitrary-number approach can save more efforts than the
Monte Carlo method.
4.2.3 Incorporation of Road Environmental Variables
Table 4.12 summarizes the hot zones by segmentation length and
predictors. In general, the hazardous road locations accounted for around 20%
of ATC road networks, but with more than 50% of road crashes happening on
them in both periods regardless of the type of models and segmentation length,
118
indicating the great clustering tendency of the road crashes in the whole
territory. If one closely compares the hot zones among different models, one
may observe that the hot zones are sensitive to environmental predictors. For
instance, in the period from 2005 to 2007, the number of hot zones was 462
with 2,045 BSUs and 17,594 road crashes happening on them when Model L
was used to determine the threshold value and the BSU length was defined as
100 meters, whereas the number of hot zones reached 546 with 2,090 BSUs
and 16,487 road crashes when Model LATJ was used to calculate the threshold
value. Even with the same variables, the hot zones identified with threshold
value determined by Model LA and Model LA’ were different. The Model LA
detected longer hot zones with more road crashes than Model LA’ did
regardless of the period and segmentation length. Moreover, the segmentation
length also had great influences on the results. For example, in the period from
2002 to 2004, the length of 100-meter hot zones identified with Model L was
193.95 km and there were 17,913 road crashes happening on them. In order to
choose an appropriate segmentation length for the identification of hazardous
road locations in Hong Kong, the variability of hot zones is compared among
different segmentation lengths. The length whereby the hot zone results are
more stable will be selected to identify crash hot zones.
119
Table 4.12 Hot zones by predictor and segmentation length in 2002-2004 and 2005-2007
Predictors: BSU length (Model L)
2002-2004
2005-2007
HZ100
HZ150
HZ200
HZ100
HZ150
HZ200
508
343
245
462
347
239
Length (km) of HZs
193.95
248.28
244.12
196.74
252.51
250.99
No. of BSUs
2,010
1,739
1,306
2,045
1,771
1,314
No. of Road Crashes
17,913
19,726
19,091
17,594
19,424
18,681
No. of HZs
Predictors: BSU length, AADT (Model LA)
2002-2004
2005-2007
HZ100
HZ150
HZ200
HZ100
HZ150
HZ200
483
323
252
478
334
245
Length (km) of HZs
179.05
220.30
242.76
196.43
233.34
259.30
No. of BSUs
1,859
1,552
1,300
2,032
1,641
1,384
No. of Road Crashes
17,027
18,229
18,690
17,411
18,370
18,790
No. of HZs
Predictors: BSU length, AADT, Road type (Model LAT)
2002-2004
No. of HZs
Length (km) of HZs
No. of BSUs
No. of Road Crashes
2005-2007
HZ100
HZ150
HZ200
HZ100
HZ150
HZ200
515
336
268
494
344
249
182.99
210.78
236.52
193.58
219.55
241.21
1900
1485
1260
2007
1554
1291
16,582
17,153
17,987
16,643
17,134
17,528
Predictors: BSU length, AADT, Road type, Number of road junctions (Model LATJ)
2002-2004
No. of HZs
Length (km) of HZs
No. of BSUs
No. of Road Crashes
2005-2007
HZ100
HZ150
HZ200
HZ100
HZ150
HZ200
548
379
286
546
386
290
186.53
213.52
230.15
201.23
230.46
248.81
1924
1505
1234
2090
1624
1,334
15,922
16,627
16,743
16,487
16,885
16,870
Offset variable: Log (Length * AADT) (Model LA’)
2002-2004
2005-2007
HZ100
HZ150
HZ200
HZ100
HZ150
HZ200
467
330
267
527
371
273
Length (km) of HZs
159.78
182.49
205.94
183.72
214.12
226.42
No. of BSUs
1,686
1,309
1,123
1,938
1,532
1,239
No. of Road Crashes
13,379
13,533
14,269
14,165
14,521
14,445
No. of HZs
120
Table 4.13 summarizes the variation of hot zones with threshold values
defined by Model L, Model LA, Model LAT, Model LATJ and Model LA’ by
segmentation length. The table shows the mean, standard deviation and CV
values of the five types of hot zones (Model L, Model LA, Model LAT, Model
LATJ and Model LA’). The indicators include number of hot zones, number of
BUSs, length of hot zones and total crash count. Most CV values with
100-meter hot zones were equal to or less than those with 150-meter and
200-meter hot zones except the number of hot zones in the period of
2002-2004. The statistics suggested that 100-meter segmentation interval was
more stable than 150-meter or 200-meter segmentation interval. Hence, the
BUS length is better defined as 100 meters. The following analysis will be
based upon the 100-meter hot zones only.
Table 4.13Variation of hot zones by segmentation length
2002-2004
HZ type
100
150
200
Mean
Standard
deviation
CV
Mean
Standard
deviation
CV
Mean
Standard
deviation
CV
No. of HZs
504.20
31.16
0.06
342.20
21.86
0.06
263.60
15.92
0.06
No. of BSUs
180.46
12.80
0.07
215.07
23.52
0.11
231.90
15.54
0.07
Length of HZs
1875.80
119.61
0.06
1518.00
154.06
0.10
1244.60
74.12
0.06
No. of Crashes
16164.60
1717.07
0.11
17053.60
2297.27
0.13
17356.00
1942.50
0.11
2005-2007
HZ type
100
200
150
Mean
Standard
deviation
CV
Mean
Standard
deviation
CV
Mean
Standard
deviation
CV
No. of HZs
501.40
34.64
0.07
356.40
21.41
0.06
259.20
21.52
0.08
No. of BSUs
194.34
6.54
0.03
230.00
14.83
0.06
245.35
12.39
0.05
Length of HZs
2022.40
55.98
0.03
1624.40
93.89
0.06
1312.40
53.48
0.04
No. of Crashes
16460.00
1368.54
0.08
17266.80
1842.10
0.11
17262.80
1768.35
0.10
121
In order to examine the stability of hot zones with different models, the
mean, standard deviation, and CV of hot zones of two periods are calculated
for each type of model, and the results are summarized in Table 4.14. Focusing
on traffic exposure which was introduced into Model LA and Model LA’, the
average crash intensity of a Model-LA hot zone was 35.8 (17,210 divided by
480.5), whereas the mean value of crash intensity of Model-LA’ hot zones was
only 27.7 (13,772 divided by 497). More crucially, there was greater variability
between Model-LA hot zones than that between Model-LA’ hot zones. In this
light, Model LA is preferable if one wants to take into consideration the traffic
volume. Hence, further comparison will be made among Model L, Model LA,
Model LAT and Model LATJ only.
Table 4.14 Summary of 100-meter hot zones by model
HZ type
Model L
Model LA
Mean
Standard
deviation
CV
No. of HZs
485.0
32.53
0.07
No. of BSUs
195.35
1.97
0.01
Length of HZs
2027.5
24.75
0.01
1945.5
No. of Crashes
17753.5
Standard
deviation
CV
480.5
3.54
187.74
12.29
225.57
0.01
17219
Mean
Model LATJ
HZ type
Model LAT
Mean
Standard
deviation
CV
0.01
504.5
14.85
0.03
0.07
188.29
7.49
0.04
122.33
0.06
1953.5
75.66
0.04
271.53
0.02
16612.5
43.13
0.00
Model LA’
Mean
Standard
deviation
CV
Mean
Standard
deviation
CV
No. of HZs
547.0
1.41
0.00
497.0
42.43
0.09
No. of BSUs
193.88
10.39
0.05
171.75
16.93
0.10
Length of HZs
2007
16204.5
117.38
399.52
0.06
0.02
1812
13772
178.19
555.79
0.10
0.04
No. of Crashes
122
Comparatively speaking, Model L is found most stable in terms of the
number of BSUs and the length of hot zones, whereas Model LATJ is more
robust if one focuses on the number of hot zones (demonstrated as CV in Table
4.14). When the hot zones of the two periods are overlaid, it is observed from
Table 4.15 that 69.7% of Model-L hot zones in 2002-2004 were compatible
with 68.8% of 2005-2007 hot zones, whereas the shares were a little smaller
with Model-LATJ hot zones. Hence, considering the stability, one may choose
Model L which does not incorporate any environmental variables to define the
threshold values. However, the engineers may not be interested in hot zones
that only reflect the variation of AADT, type of roads and the nearby road
junctions, but are more concerned with hot zones resulting from other factors.
Table 4.16 summarizes the comparison results of Model-L and Model-LATJ hot
zones. The hot zones in L -cum- LATJ were significantly dangerous as they
were identified by both models. The number of Model-L-only hot zones was
79, whereas the number of Model-LATJ-only hot zones was only 146. Model L
identified more hot zones in Kowloon City and Kwun Tong, and Yuen Long
became one of the most dangerous road locations when the Model LATJ was
used. Which district(s) should be treated with a higher priority? For medical
services allocation, Kwun Tong may be received more treatments, as hospital
authorities may be more concerned with absolute number of road crashes. But
policy-makers who work on safety improvement program and aim to find
123
places where road safety is most likely to be improved may be more interested
in Yuen Long, since there might be more local environmental factors that
cause the occurrence of road crashes in addition to the general factor that had
been controlled into the models. Hence, for the identification of hazardous
road locations, both types of hot zones are important in addressing road safety
problems.
Table 4.15 Hot zones identified by both periods
HZL
330
69.7%
HZLATJ
325
68.8%
66.2%
66.3%
Notes:
1. Percentages of HZ24 in the HZ24-cum-HZ57 category are typed in italics
2. Percentages of HZ57 in the HZ24-cum-HZ57 category are underlined
124
Table 4.16 Comparison of Model-L and Model-LATJ hot zones by district
Model L and Model LATJ
Total
L-cum-LATJ
L-only
LATJ- only
349
79 (17.1%)
146(26.7%)
Urban Core
Central & Western (CW)
Hong Kong Island
22(6.3%)
7(8.9%)
5(3.4%)
Wan Chai(WCH)
Hong Kong Island
7(2.0%)
5(6.3%)
2(1.4%)
Eastern (E)
Hong Kong Island
21(6.0%)
6(7.6%)
11(7.5%)
Yau Tsim Mong (YTM)
Kowloon Peninsula
18(5.2%)
4(5.1%)
2(1.4%)
Sham Shui Po (SSPO)
Kowloon Peninsula
22(6.3%)
4(5.1%)
2(1.4%)
Kowloon City (KC)
Kowloon Peninsula
27(7.7%)
8(10.1%)
3(2.1%)
Wong Tai Sin (WTS)
Kowloon Peninsula
30(8.6%)
10(12.7%)
11(7.5%)
Kwun Tong (KT)
Kowloon Peninsula
16(4.6%)
1(1.3%)
7(4.8%)
Kwai Tsing (KTS)
Kowloon Peninsula
25(7.2%)
4(5.1%)
11(7.5%)
Southern (S)
Hong Kong Island
11(3.2%)
4(5.1%)
13(8.9%)
Sha Tin (ST)
The New Territories
34(9.7%)
3(3.8%)
14(9.6%)
Tai Po (TP)
The New Territories
28(8.0%)
5(6.3%)
14(9.6%)
Tsuen Wan (TW)
The New Territories
23(6.6%)
6(7.6%)
10(6.8%)
Tuen Mun (TM)
The New Territories
29(8.3%)
6(7.6%)
12(8.2%)
Yuen Long (YL)
The New Territories
24(6.9%)
5(6.3%)
15(10.3%)
Sai Kung (SK)
The New Territories
19(5.4%)
3(3.8%)
9(6.2%)
Northern (N)
The New Territories
15(4.3%)
2(2.5%)
9(6.2%)
Islands (I)
The New Territories
2(0.6%)
0(0.0%)
2(1.4%)
Suburb
4.3 Summary
This chapter introduces key steps of the link-attribute hot zone
identification method. Focusing on road segmentation and threshold value
definition, a series of sensitivity analyses are performed. The results indicate
that dissolved road system can detect more hazardous road locations and is
more stable than the raw-link-node road network. While using an arbitrary
125
number to define the threshold value is a simple and effort-saving method for
the identification of hot zones, employing a statistical method such as Monte
Carlo simulation can avoid selection bias in choosing an appropriate number as
the threshold value, because the Monte Carlo method employs the significance
levels such as “95%” and “99% to define the threshold values regardless of the
total number of road crashes that have happened. The hot zones are sensitive
to segmentation length and predictors. As the variability is smaller among
100-meter hot zones, the segmentation length is better defined as 100 meters.
The hazardous road locations identified by different “crash-potential” models
may differ significantly, but all of them are important in providing important
information for improving road safety.
126
CHAPTER 5
LINK-ATTRIBUTE ANALYSIS FOR ROAD
CASUALTIES
This chapter performs link-attribute analysis for pedestrian casualties. The
importance of casualty-based analysis and the reason why the pedestrian
casualties are targeted briefly discussed in the first section. The ways in which
the weight for each type of injury is determined are introduced in Section 5.3.
The data and the analytical tools for district-wide identification of hazardous
road locations for pedestrians are then presented in details in Section 5.4. In
particular, the pedestrian casualties are analyzed by incorporating the
surrounding environment of road crashes with a model-based approach.
5.1 Importance of Analysis based on Casualties
This section discusses the importance of casualty-weighted analysis and
the reason for choosing pedestrian casualties as study population.
5.1.1 Importance of Casualty-Weighted Analysis
Medical, social and other costs of traffic crashes are more closely related
to the number of casualties, rather than the number of crashes. For instance,
fatal and severely injured victims require the timely dispatch of ambulances
127
and medical treatments of trauma teams in hospitals. Weighting each of road
crashes by the number of casualties involved would be helpful for the hospitals
to understand the spatial distribution of casualties and thus appropriately
allocate emergency services. Figure 5.1 shows the numbers of road traffic
crashes and casualties during 2001 to 2010 in Hong Kong. There were about
15,000 road crashes in each year, whereas the number of traffic casualties
reached approximately 20,000, with 33.3% more than the crash count. It is
hence worthy of examination of casualty-based hot zones in Hong Kong.
25000
20000
15000
10000
5000
0
2001
2002
2003
2004
2005
Road Crashes
2006
2007
2008
2009
Road Casualties
Figure 5.1 Numbers of road crashes and casualties during 2001 to 2010
128
2010
5.1.2 Targeted Casualties: Pedestrians
Pedestrians are regarded as one of the most vulnerable road user types. In
many low-income and model-income countries, the share of pedestrian
fatalities in total road deaths is notoriously serious, due largely to the increase
in motorization and poor infrastructure which lacks separation of pedestrians
from other road users. Even in high-income and motorization countries where
people may drive more than walk, the percentage of pedestrian fatalities is still
relatively high. According to a recent statistical report by European
Commission (2010), in the year of 2008, 7,638 pedestrians died in road traffic
crashes in 23 European countries, accounting for over 20% of road traffic
fatalities in these countries.
In Hong Kong, pedestrians are at high crash risks. Table 4.1 shows Hong
Kong’s road traffic casualties during the period from 2001 to 2010. The number
of casualties fluctuated between 19,000 and 21,000. Although the shares of
pedestrians in total injuries were decreasing slightly from 25% to 20% in
recent years, pedestrians were still the most vulnerable road user group in
terms of fatal and serious injuries. For instance, there were 117 fatalities and
2,160 serious injuries in 2010, of which 59% and 34.9% represented
pedestrians. Figure 5.2 and Figure 5.3 show the shares of fatalities and serious
injuries by road user type. The rate of fatal and serious injuries in pedestrian
129
casualties was significantly higher than those in driver and passenger casualties.
Such alarming situation merits the academic and public’s concern to the safety
of pedestrian. Hence, the causality-based analysis in this thesis will focus on
pedestrian victims. As people are more likely to be aware of pedestrian hot
zones in their vicinity, hot zones for pedestrians would be highly related to
road users at the local level. In this light, the spatial distribution of pedestrian
casualties will be analyzed at district level.
130
Table 5.1 Road Traffic Casualty Statistics in Hong Kong by Road User Type, 2001-2010
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2001-2010
Pedestrian
4978(24.5%)
4805(23.3%)
4517(24.7%)
4577(23.6%)
4401(22.9%)
4230(22.4%)
4077(20.8%)
3823(20.5%)
3583(19.8%)
3898(20.4%)
4288.9(22.3%)
Driver
8095(39.8%)
8315(40.4%)
7689(42.0%)
8102(41.8%)
8251(43.0%)
8327(44.1%)
8879(45.3%)
8382(44.9%)
8384(46.2%)
8730(45.6%)
8315.4 (43.3%)
Passenger
7244(35.7%)
7473(36.3%)
6104(33.3%)
6723(34.7%)
6554(34.1%)
6311(33.4%)
6662(34.0%)
6479(34.7%)
6171(34.0%)
6496(34.0%)
6621.7 (22.3%)
20317
20593
18310
19402
19206
18868
19618
18684
18138
19124
19226
Pedestrian
97(56.1%)
86(50.3%)
99(49.0%)
96(57.8%)
78(51.7%)
78(54.2%)
91(57.2%)
88(54.3%)
71(51.1%)
69(59.0%)
85.3(53.9%)
Driver
51(29.5%)
57(33.3%)
66(32.7%)
51(30.7%)
48(31.8%)
48(33.3%)
51(32.1%)
46(28.4%)
41(29.5%)
37(31.6%)
49.6(31.3%)
Passenger
25(14.5%)
28(16.4%)
37(18.3%)
19(11.4%)
25(16.6%)
18(12.5%)
17(10.7%)
28(17.3%)
27(19.4%)
11(9.4%
23.5(14.8%)
173
171
202
166
151
144
159
162
139
117
158.4
Pedestrian
1332(37.9%)
1232(36.0%)
1069(36.2%)
981(35.0%)
967(36.0%)
908(36.2%)
882(34.9%)
782(34.3%)
723(34.5%)
744(34.4%)
962(35.7%)
Driver
1382(39.3%)
1419(41.4%)
1203(40.8%)
1210(43.2%)
1178(43.8%)
1056(42.1%)
1120(44.3%)
956(41.9%)
922(44.0%)
985(45.6%)
1143.1(42.4%)
Passenger
803(22.8%)
774(22.6%)
679(23.0%)
612(21.8%)
543(20.2%)
542(21.6%)
528(20.9%)
543(23.8%)
451(21.5%))
431(20.0%
590.6(21.9%)
3517
3425
2951
2803
2688
2506
2530
2281
2096
2160
2695.7
Casualties
Total
Fatalities
Total
Serious Injuries
Total
131
2.5%
2.0%
1.5%
Pedestrian
Driver
1.0%
Passenger
0.5%
0.0%
2001
2002
2003
2004
2005
2006
2007
2008
2009
Figure 5.2 Percentages of fatalities by road user type, 2001-2010
132
2010
2001-2010
30.0%
25.0%
20.0%
Pedestrian
15.0%
Driver
Passenger
10.0%
5.0%
0.0%
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2001-2010
Figure 5.3 Percentages of fatal and seriously injured casualties by road user type, 2001-2010
133
5.2 Casualty-Weighted Analysis
Road traffic casualties can be classified by systems such as Abbreviated
Injury Scale or the Injury Severity Scale (ISS). Tsui et al (2009) has used ISS to
describe the severity of injuries in Hong Kong. However, this thesis categorizes
the casualties based on police reports as the ISS system has been employed
only in some pilot studies. Hence, the casualties are categorized into three
types by severity of injury, namely fatalities, serious injuries and slight injuries.
While the unweighted method treats each casualty equally regardless of its
injury type, the cost-weighted method assigns different weights to different
severity levels. The section presents the way in which crash intensity is
measured by these two methods.
5.2.1Unweighted
With an unweighted approach, the link-attribute crash intensity LCIuw(i)
in BSU i is defined as:
LCIuw(i )  Fa (i )  Se(i )  Sl (i )
5.1
where i=1,2,3…n; n is the number of BSUs; Fa is the number of fatalities; Se is
the number of serious injuries; and Sl is the number of slight injuries.
134
5.2.2Cost-Weighted
Valuation of the prevention of a traffic casualty is important to ranking
economic viability of transport schemes and allocation of scarce recourses
(iRAP, 2007). Research into the benefit of preventing a road crash fatality
which is measured by the Value of a Statistical Life (VSL) has been pursued for
a long history (Ashenfelter, 2006; de Blaeij et al., 2003; Elvik, 1997; Miller,
2000). Several techniques have been employed to estimate VSL, including
Human Capital, Willingness-to-pay (WTP), Stated Preference and Reveled
Preference approaches. As WTP is generally acknowledged to be the most
valid methodology (iRAP, 2007), a growing number of countries have
preferred WTP in recent years. The approach estimates the value of
preventing a fatality by estimating the amount of money that individuals
would be prepared to pay to reduce the risk of loss of life (iRAP, 2007). While
some studies have been conducted to establish WTP values for traffic fatalities,
those that have values for serious or slight injuries were limited. This is mainly
because of problems on designing questionnaires to elicit reliable WTP
estimates (iRAP, 2007). Nevertheless, there are some developed countries
attempting to employ the WTP method to estimate the costs of non-fatal
casualties. Table 5.2 lists casualty costs in some developed countries using the
WTP approach. On average, the costs of a serious injury and a slight injury are
approximately 16% and 1% of the value of a fatality respectively. With the
135
absence of casualty cost data in Hong Kong, this research is based on these
average ratios and defines the cost-weighted link-attribute crash intensity as:
LCIcw(i )  100Fa (i )  16Se(i )  Sl (i )
5.2
where i=1,2,3…n; n is the number of BSUs; Fa is the number of fatalities; Se is
the number of serious injuries; and Sl is the number of slight injuries.
Table 5.2 Casualty cost in developed countries using willingness-to-pay approach
Year
Currency
Cost
Ratio
Fatal
Serious
Slight
Serious/
Fatal
Slight/
Fatal
Austria
2006
€
2,676,374
316,772
22,722
11.9%
0.8%
New Zealand
2006
NZ$
3,065,000
535,000
60,000
17.5%
2.0%
Singapore
2008
S$
1,874,000
243,600
18,740
13.0%
1%
Sweden
2005
SK
18,383,000
3,280,000
N/A
17.8%
N/A
United
Kingdom
2006
£
1,489,163
167,332
12,898
11.2%
0.9%
United States
2007
$
5,800,000
1,322,300
30,920
22.7%
0.5%
15.6%
1.0%
Average
Sources: iRAP (2007); Le et al.(2011); Ministry of Transport, New Zealand (2010);
Department for Transport, UK(2007); Szabat and Knapp (2009);GmbH (2008)
5.3 District-Wide Identification of Hazardous Road
Locations for Pedestrians
The pedestrian casualties in a selected district are analyzed by two
methods in defining threshold values. One is the simple ranking method
which only focuses on the observations without considering any crash
predictors. The other is a model-based approach which defines threshold
136
values by incorporating the surrounding environment of crashes.
5.3.1Data Description
5.3.1.1
Study area
In order to focus on a manageable but reasonably complex setting, Kwun
Tong District of Hong Kong is selected as the study area. It is situated at the
eastern part of the Kowloon Peninsula. The district is chosen for investigation
of pedestrian casualties as it was among the top three populous districts
(562,427 in 2001 and 587,423 in 2006) and ranked the first place in terms of
population density (4.9 persons per 100 sq. m. in 2001 and 5.2 persons per 100
sq. m. in 2006) among 18 districts of the city (see the population density map
of 2006 in Figure 5.4 as an example) based on the 2001 and 2006 Census data
(Census and Statistics Department 2002, 2007).
137
Figure 5.4 Population Density in 2006
5.3.1.2
Road network
The entire road network of the district is used for casualty-based analysis.
In Kwun Tong District, the road network (see Figure 5.5) is approximately 157
km in length, with 1,303 links and 1,025 nodes. The raw link-node system is
firstly dissolved with the dissolving algorithm. After segmenting the dissolved
road system (487 links) with 100-meter interval, 1,852 BSUs are obtained.
Table 5.3 shows descriptive statistics on length of BSUs based on raw and
dissolved road system. The variation of length of BSU is significantly decreased
when the dissolved road system is used. The following analysis will use BSUs
138
derived from the dissolved road system.
Figure 5.5 Road Network in Kwun Tong District
Table 5.3 Statistics on length of BSU before and after dissolving performance
Before
dissolving
After
dissolving
5.3.1.3
Std.
Deviation (m)
Quartiles (m)
50
25
No.
Mean (m)
2,382
68.9
33.8
38.0
81.2
100
1,852
84.8
23.5
80.2
100
100
75
Pedestrian casualties
There were altogether 944 and 940 pedestrian casualties in Kwun Tong
District during the periods of 2002-2004 and 2005-2007 respectively. The
139
numbers and percentages of fatal, seriously and slightly injured victims are
shown in Table 5.4. During these two periods, the fatal and seriously injured
casualties accounted for about 25% of the total injuries. Hence, it is
worthwhile to investigate pedestrian casualties by severity type in Kwun Tong
District.
Table 5.4 Numbers and Percentages of fatalities, serious and slight pedestrian injuries
Fatality
Serious injury
Slight injury
Total
2002-2004
22 (2.3%)
213(22.6%)
709 (75.1%)
944
2005-2007
21(2.2%)
242(25.7%)
677(72.0%)
940
The three types of pedestrian injuries are aggregated by BSU. Table 5.5
presents descriptive statistics on pedestrian casualties by injury type. The
observations suggest that the spatial distributions of pedestrian casualties in
2002-2004 and 2005-2007 could be similar. In both periods, more than 50% of
BSUs had no crashes happening on them. The variation in number of fatalities
is relatively small with maximum number of 2 and standard deviation equal to
about 0.11 in both periods, whereas the slight injury count varied greatly with
a range from 0 to 11 (standard deviation equal to 1.05) in the period of
2002-2004 and from 0 to 8 (standard deviation equal to 0.92) in the period of
2005-2007.
140
Table 5.5 Descriptive statistics on pedestrian casualties
Mean
Median
Std.
Deviation
Minimum
Maximum
0.51
0.00
1.33
0
14
0.01
0.00
0.11
0
2
0.12
0.00
0.45
0
7
0.38
0.00
1.05
0
11
0.51
0.00
1.22
0
12
Number of fatalities
0.01
0.00
0.12
0
2
Number of serious injuries
Number of slight injuries
0.13
0.00
0.45
0
6
0.37
0.00
0.92
0
8
2002-2004
Number of casualties
Number of fatalities
Number of serious injuries
Number of slight injuries
2005-2007
Number of casualties
5.3.1.4
Surrounding environment
Surrounding environmental variables
include road environmental
variables such as road junctions, land use variables such as land mix index,
socio-economic environmental factors such as social and economic deprivation
level, and demographic predictors like population density. According to the
spatial scales at which attributes are aggregated, these variables can also be
categorized into BSU-level and TPU-level types.
BSU-level environmental variables
As it is generally accepted that most of road crashes involving pedestrians
happen around road junctions, the number of junctions on each BSU might
explain the variation of occurrence of traffic crashes. To measure this variable,
the road junctions are aggregated by BSU.
141
TPU-level environmental variables
The characteristics of the neighborhoods may also have great impacts on
the incidence of traffic crashes. The Tertiary planning units (TPUs) are used in
this study to capture the land use, demographic and socio-economic conditions
of the community. For each TPU-based variable, all the BSUs within a same
TPU were assigned the same value.
As described in Section 3.4.1.4, land use data of 2004 and 2006 will be
used for measuring land use mix in the two periods. Land use mix is calculated
based on Simpson Diversity Index. It is a biological diversity measurement
which evaluates the number of land use categories within a neighborhood.
Following the formal expression of the index, the land use diversity, denoted
as LUD, can be calculated by:
ni (ni  1)
LUD  
N ( N  1)
(5.3)
where N is the total number of land use categories and n i is the number of
patches of land in the ith category. As described in the second chapter, this
research broadly divided the land use into five categories, namely residential,
commercial, industrial, institutional and others.
The socio-economic deprivation index (SDI) describes the deprivation
level of neighborhoods. As reviewed in the literature, increasing deprivation
142
level may increase the chance of being a victim in road crashes. Hence, SDI is
used as one dimension of “prior information”. The way in which SDI is
calculated by TPU can be found in Section 3.4.1.5.
Greater population density may increase the exposure to collisions. Hence,
the population is also introduced as an explanatory variable.
The complexity of the road structure in the vicinity might impact the
occurrence of road crashes. This study treats the road junction density as
another independent variable.
5.3.2 Data Analysis
5.3.2.1 Simple ranking
As no numerical definitions have been used for casualty-based spatial
analysis of road crashes in Hong Kong, this research will employ a simple
ranking method which regards certain percentile (90%, 95% or 99%) of the
observations as the threshold value. The advantage of this method is that one
can roughly control the number of hot BSUs.
Unweighted
Crash intensities for the unweighted analysis, LCIUW, are calculated by
following Equation 5.1. The threshold values are defined as 90%, 95% and 99%
percentiles of LCIUW.
143
Table 5.6 Percentiles of crash intensity for the unweighted analysis
2002-2004
2005-2007
Percentile
90%
95%
99%
90%
95%
99%
Value
2
3
7
2
3
5
Cost-weighted
Crash intensities for the cost-weighted analysis, LCICW, are calculated by
following Equation 5.2. The threshold values are defined as 90%, 95% and 99%
percentiles of LCICW.
Table 5.7 Percentiles of crash intensity for the cost-weighted analysis
2002-2004
2005-2007
Percentile
90%
95%
99%
90%
95%
99%
Value
4
17
100
16
18
100
5.3.2.2 Incorporation of the surrounding environment of crashes
As mentioned in Chapter Two, the concentration of road crashes is not
driven by a single force like traffic exposure, but by a synthesis of various
determinants such as characteristics of road structure and land use type.
Treating road crashes in relation with their surrounding environmental
context, including street and land use characteristics as well as demographic
and socio-economic environment could better understand the spatial
distribution of road crashes and could predict their occurrence. The expected
crash intensity can be estimated by a statistical model with these
environmental factors. Using expected intensity as the threshold value, crash
144
hot zones will be identified in a “potential crash reduction” manner.
Unweighted
In unweighted analysis, the crash intensity LCIUW is determined as the
number of traffic casualties which can be calculated with Equation 4.1. Taking
into account environmental variables, the threshold value, t uw, is defined as
expected number of casualties which is estimated by a regression model. The
descriptive statistics on pedestrian casualties in Table 5.5 has demonstrated
that the traffic casualties (LCIUW) were over dispersed with mean much lower
than its variance. Hence, negative binomial models are used to estimate the
expected number of pedestrian casualties (t uw). It should be pointed out that
the length of BSU is still varied even though the dissolving algorithm is
performed on the raw-road link system. As shown in Table 5.3, the mean value
of BSU length is 84.8 m with standard deviation equal to 23.5 m after
segmentation with a dissolved road network. The length of BSU will be hence
introduced into models as an explanatory variable. Two kinds of negative
binomial models are used for investigating the sensitivity of crash hot zones to
the selected surrounding environmental variables. One could be treated as a
base model which included the length of BSU as its only explanatory variable.
The other, regarded as a full model, introduced all of surrounding
environmental variables, including length of BSU, the number of junctions on
145
the BSU, land mix, SDI, population density and junction density. For both
types of models, the dependent variable is the crash intensity LCIUW.
Table 5.9 shows the coefficient and p-value of length as well as
over-dispersion test in base models, which indicates that the length of BSU
was positively and significantly associated with the number of pedestrian
casualties on BSUs and the distribution of road crashes is better modeled by a
negative binomial model. The tuw is then estimated by the base model.
Table 5.8 Link-attribute results on base negative binomial models for pedestrian casualties
with length as independent variable
2002-2004
2005-2007
Parameter
B
Sig.
B
Sig.
(Intercept)
-3.561
0.000
-3.425
0.000
Length
0.031
0.000
0.030
0.000
Log likelihood
-1583
-1621
Akaike's Information Criterion
3172
3248
alpha
Likelihood ratio test of alpha=0
(Sig)
3.9
3.2
957.2 (0.000)
739.9 (0.000)
The results on full models with six independent variables are shown in
Table 5.9. In both periods, increasing length of BSU, the number of road
junctions, land mix and SDI could significantly increase the chance of being a
pedestrian victim. As the p-values of population density and junction density
were large in both periods, the two variables are excluded in the final full
models. The coefficient and p-value of each variable in the final full models are
shown in Table 5.10. The tuw is estimated by the final full model with four
146
independent variables.
Table 5.9 Link-attribute results on full negative binomial models for pedestrian casualties with
six independent variables
2002-2004
Parameter
(Intercept)
Length
Number of junctions
Land Mix
SDI
Population density
Junction density
2005-2007
B
Sig.
B
Sig.
-5.470
0.000
-5.016
0.000
0.038
0.000
0.035
0.000
0.356
0.000
0.278
0.000
0.018
0.000
0.022
0.000
0.141
0.000
0.299
0.000
3.049E-6
0.336
-1.503E-6
0.603
-.016
0.844
-0.033
0.655
Log likelihood
-1528
-1560
Akaike's Information Criterion
3067
3136
alpha
Likelihood ratio test of alpha=0
(Sig.)
2.93
2.35
719.0(0.000)
571.4(0.000)
Table 5.10 Link-attribute results on full negative binomial models for pedestrian casualties
with four independent variables
2002-2004
Parameter
(Intercept)
Length
Number of junctions
Land Mix
SDI
2005-2007
B
Sig.
B
Sig.
-5.523
0.000
-5.203
0.000
0.037
0.000
0.035
0.000
0.353
0.000
0.279
0.000
0.021
0.000
0.019
0.000
0.000
0.267
0.158
0.000
Log likelihood
-1540
-1580
Akaike's Information Criterion
3070
3141
alpha
Likelihood ratio test of alpha=0
(Sig.)
2.92
2.41
759.1(0.000)
597.4 (0.000)
Cost-weighted
In cost-weighted analysis, the crash intensity LCIcw is defined as a
cost-weighted score which is calculated by Equation 5.2. The threshold value
147
tcw is determined as:
tcw(i )  100EFa (i )  16ESe(i )  ESl(i )
(5.4)
where EFa(i), ESE(i) and Esl(i) are expected numbers of fatalities, serious and slight
injuries which are calculated by negative binomial models.
The results of base models are shown in Table 5.11. The alpha and the
likelihood ratio test demonstrate that the negative binomial regression model
is better than Poisson model. The length of BSU had significant relationship
with the numbers of seriously injured and slightly injured pedestrian casualties
(p-values less than 0.001). The p-values of BSU length in the fatal pedestrian
casualty models were more than 0.05 (0.072 in the 2002-2004 model and 0.078
in the 2005-2007 model), but they can still be regarded as relatively small.
Hence, the variable was also included in the model in order to be consistent
with seriously injured and slightly injured pedestrian casualty models. The EFa,
ESE and Esl are then estimated by the three base models respectively. The
base-model tcw is then calculated by following Equation 5.4.
148
Table 5.11 Link-attribute results on base negative binomial models for pedestrian casualties by
injury type
2002-2004
2005-2007
B
Sig.
B
Sig.
Dependent variable : Number of fatal pedestrian casualties
(Intercept)
-8.326
0.000
-7.839
0.000
Length
Log likelihood
0.042
0.072
0.036
0.078
Akaike's Information Criterion
alpha
Likelihood ratio test of alpha=0
(Intercept)
Length
Log likelihood
(Intercept)
Length
Log likelihood
-111
237
229
14.9
7.1(0.004)
Dependent variable :Number of seriously injured
pedestrian casualties
-4.986
0.000
-4.858
0.000
5.6
4.2 (0.017)
0.031
Akaike's Information Criterion
alpha
Likelihood ratio test of alpha=0
-115
0.000
0.031
0.000
-608
-708
1323
1422
4.8
3.3
120.1(0.000)
87.4(0.000)
Dependent variable: Number of slightly injured pedestrian
casualties
-3.831
0
-3.692
0
0.031
0
0.029
0
-1354
-1353
Akaike's Information Criterion
2715
2712
alpha
Likelihood ratio test of alpha=0
3.9
630.3(0.000)
3.2
435.0(0.000)
149
As only length of BSU, number of junctions on BSU, land mix and SDI are
found to be significantly associated with the incidence of pedestrian casualties
(see Table 5.9 and Table 5.10), the full negative binomial models only
introduce these four environmental factors as independent variables. The
results on full models for fatal, seriously injured and slightly injured pedestrian
casualties are shown in Table 5.12. The over-dispersion tests indicate that
negative binomial models are better than Poisson models regardless of the
dependent variable. For both periods, the four variables were positively and
significantly related with the number of seriously and slightly injured
pedestrian casualties. However, the relationship between predictors and the
occurrence of fatal pedestrian casualties was not as close as that with serious
and slight injuries (see p-values in Table 5.12). In particular, the p-values of
BSU length and SDI in the 2002-2004 fatality model were more than 0.05.
Nonetheless, given that the p-values were less than 0.05 in the 2005-2007
model and smaller than 0.1 in the 2002-2004 model, the two variables were
still chosen as the independent variables in computing expected number of
fatal pedestrian casualties so as to be consistent with serious and slight injury
models. The EFa, ESE and Esl are then estimated by the three full models
respectively. The full-model tcw is then calculated by following Equation 4.4
150
Table 5.12 Link-attribute results on full negative binomial models for pedestrian casualties by
injury type
2002-2004
2005-2007
B
Sig.
B
Sig.
Dependent variable : Number of fatal pedestrian casualties
(Intercept)
-9.566
0.000
-11.045
0.000
Length
Number of junctions
0.045
0.060
0.047
0.034
0.536
0.029
0.512
0.021
Land Mix
0.024
0.049
0.03
0.007
SDI
Log likelihood
0.039
0.089
0.379
0.010
-113
-102
Akaike's Information Criterion
238
219
alpha
Likelihood ratio test of alpha=0
(Sig.)
5.6
7.2
2.6 (0.047)
4.4 (0.018)
Dependent variable :Number of seriously injured
pedestrian casualties
(Intercept)
-7.182
0.000
-7.01
0.000
Length
0.038
0.000
0.038
0.000
Number of junctions
Land Mix
0.381
0.000
0.359
0.000
0.018
0.000
0.024
0.000
SDI
0.212
0.000
0.257
0
Log likelihood
Akaike's Information Criterion
alpha
Likelihood ratio test of alpha=0
(Sig.)
-608
-672
1229
1357
3.5
2.1
92.6(0.000)
56.1(0.000)
Dependent variable: Number of slightly injured pedestrian
casualties
(Intercept)
Length
-5.772
0
-5.33
0
0.037
0
0.034
0
Number of junctions
0.347
0
0.238
0
Land Mix
SDI
0.023
0
0.017
0
0.143
0
0.264
0
Log likelihood
-1306
-1309
Akaike's Information Criterion
alpha
Likelihood ratio test of alpha=0
(Sig.)
2625
2631
2.9
2.4
451.6(0.000)
325.1(0.000)
151
5.3.3Empirical Bayes
EB approach is regarded as a promising technique which has been applied
to many applications on the identification of crash hot spots. It has great
advantages over conventional methods in dealing with regression-to-the-mean
problem in which locations with a randomly high number of road crashes is
falsely identified as hazardous locations and vice versa (Persaud, Lyon &
Nguyen, 1999). The essence of the EB approach is to smooth out the random
fluctuation in crash records by specifying the safety of a location with an
estimate of long-term mean ( m) instead of its observed short-term crash count
(Persaud, Lyon & Nguyen, 1999). The m is estimated by combining the
observed crash counts and an estimate of the expected number of road crashes
as:
1
1
m(
)  (1 
)x
1  / k
1  / k
(5.5)
where λ is the expected number of road crashes estimated by a statistical
model; k is the inverse value of the over-dispersion parameter which can be
calibrated by the crash predication model and x is the observed crash counts
(Elvik, 2007). In hot spot identification, the value of m is used in two ways
(Elvik, 2007). One is used directly to rank locations which have high value of
m. The other is to identify hot spots in terms of crash potential reduction
152
which is defined as the difference between m and λ . Sources of variation of the
road crashes can then be classified into three types (Elvik, 2007) which are:
(1) General factors, included in the crash prediction model;
(2) Random variation, the excess of observed crash counts to the EB
estimate; and
(3) Local factors (and unknown or unmeasured general factors), also
named dispersion factors, the difference between the EB estimate and the
model estimate.
To investigate the use of EB technique in the identification of crash hot
zones, the EB approach will be applied to the identification of hazardous road
locations for pedestrians. Using the unweighted casualty-based approach, the
EB-based crash intensity, LCIEB, can be calculated by:
1
1
LCIEB  (
) p  (1 
)C
1 p / k
1 p / k
(5.6)
where p is the expected number of pedestrian casualties estimated by a
statistical model; k is the inverse value of the over-dispersion parameter which
can be calibrated by the casualty predication model and C is the observed
pedestrian casualties. The negative binomial full model will be used to estimate
the expected number of pedestrian casualties (p ), which introduces length and
road junction counts of a BSU, and Land Mix index and SDI index of a TPU as
153
the independent variables.
The EB estimate will also be used in two ways, namely simple ranking and
potential for casualty reduction methods.
5.3.3.1 Simple ranking
In simple-ranking identification, the crash intensity is measured by the
EB-estimate and the threshold value is determined by the percentiles of the EB
estimate. Table 5.13 shows the 90%, 95% and 99% percentiles of EB estimates
of 2002-2004 and 2005-2007 periods.
Table 5.13 Percentiles of EB estimates for unweighted analysis
2002-2004
2005-2007
Percentile
90%
95%
99%
90%
95%
99%
Value
1.4
2.4
5.4
1.5
2.2
4.6
5.3.3.2 Safety potential
In safety potential identification, the crash intensity is measured by the
EB-estimate and the threshold value is determined by the pedestrian-casualty
prediction model. The value of the threshold is equal to p in Equation 5.6.
154
5.4 Results
5.4.1 Simple Ranking
Table 5.14 shows the results on pedestrian hot zones with threshold
values defined by simple ranking method. If the two casualty-based methods
are compared, the statistics suggest that unweighted method could identify
longer hot zones with greater number of casualties, whereas the cost-weighted
method identified hazardous road locations with high concentration of fatal
and serious injured victims. If the two types of hot zones are overlaid by GIS,
the hot zones could be classified into three types, namely hot zones identified
by both unweighted and cost-weighted approaches, hot zones identified only
by the unweighted method and hot zones identified only by the cost-weighted
approach. Table 5.15 shows the statistics on hot zones identified by only one
approach. The results suggest that more than 50% of hot zones identified by
the unweighted and cost-weighted approaches could be compatible when 90%
and 95% percentiles were used, but the compatibility was sharply reduced
when the threshold value was increased to the 99% percentile. If one closely
examines the one-approach-only hot zones, they could find that the hot zones
identified only by the unweighted approach were characterized as high
concentration of slight injuries with no or very limited number of fatalities
and serious injuries whereas the hot zones identified only by the
155
cost-weighted approach had greater number of fatal and seriously injured
pedestrian casualties. Figure 5.6 delineates the locations of hot zones identified
only by one approach with threshold values defined by 95% percentile of
crash intensity. The bar charts describe the numbers of fatalities, serious
injuries, slight injuries and total casualties in each hot zone. There were four
hot zones identified only by the unweighted method. The number of
pedestrian casualties happening on these hot zones ranged from 6 to 18.
Among these victims, none were fatally injured and only one was seriously
injured in two hot zones respectively. Five hot zones were only detected by
the cost-weighted approach. The total number of pedestrian casualties in each
cost-weighted-only hot zone was less than 10, but the fatalities and serious
injuries accounted for a larger share of pedestrian casualties with greater
numbers than those in unweighted-only hot zones.
156
Table 5.14 Statistics on link-attribute pedestrian hot zones with threshold value defined by the
simple ranking method
Unweighted
2002-2004
HZ type
2005-2007
90%
95%
99%
90%
95%
99%
34
18
2
33
19
7
14.15
6.64
0.40
13.87
6.63
1.97
Number of BSUs
145
68
4
142
68
20
Number of pedestrian casualties
564
367
30
505
327
146
9
8
1
13
10
8
125
71
10
133
87
39
Number of HZs
Length of HZs (km)
Number of fatalities
Number of serious injuries
Cost-weighted
2002-2004
HZ type
2005-2007
90%
95%
99%
90%
95%
99%
28
19
0
37
17
2
10.47
5.77
0
10.64
5.08
0.37
Number of BSUs
109
59
0
110
52
4
Number of pedestrian casualties
434
263
0
370
265
30
Number of fatalities
13
5
0
16
13
5
Number of serious injuries
123
89
0
147
87
5
3566
2093
0
4159
2857
600
Number of HZs
Length of HZs (km)
Cost
Table 5.15 Link-attribute hot zones identified only by the unweighted or cost-weighted
method
2002-2004
90%
UW-only
13
(38.24%)
95%
CW-only
5
(17.86%)
UW-only
CW-only
4
5
(22.22%)
(26.32%)
2005-2007
90%
UW-only
13
(39.39%)
99%
UW-only
2
(100%)
95%
CW-only
13
(35.14%)
UW-only
6
(31.58%)
99%
CW-only
3
(17.65%)
157
CW-only
0
(-)
UW-only
5
(71.43%)
CW-only
0
(0%)
(a)
(b)
Figure 5.6 Hot zones identified only by (a) equal-weighing and (b) cost-weighted method
158
To investigate the stability of the performance of the two approaches, the
hot zones of two periods are compared in terms of the number and length of
hot zones, the number of BSUs, crash intensity represented by the number of
pedestrian casualties for unweighted approach and the total costs for the
cost-weighted approach. It is observed that the variability of either the
unweighted or the cost-weighted approach significantly increased when the
threshold value was raised from 90% percentile to 99% percentile of the
observations. Taking the unweighted hot zones as an example, the mean and
the standard deviation of the number of 90% hot zones was 33.5 and 0.71 with
CV value equal to 0.02, whereas the average number of 99% hot zones was 4.5
with standard deviation and CV equal to 3.54 and 0.79. As demonstrated by
the CV values, there was great variability among 99% hot zones, which
increase the chance of false positive problems. Moreover, all of CV values with
cost-weighted hot zones were greater than those with equal-weighing hot
zones, indicating that the performance of the unweighted approach is more
stable than the cost-weighted approach.
159
Table 5.16 Variation of hot zones between two periods
90%
Mean
UW
Standard
deviation
Mean
CW
Standard
deviation
CV
CV
No. of Hot Zones
33.50
0.71
No. of BSUs
0.02
32.50
6.36
0.20
143.50
2.12
0.01
109.50
0.71
0.01
Crash Intensity
534.50
41.72
0.08
3862.50
419.31
0.11
Length of Hot Zones
14.01
0.20
0.01
10.56
0.12
0.01
CV
95%
Mean
UW
Standard
deviation
CV
Mean
CW
Standard
deviation
No. of Hot Zones
18.50
0.71
0.04
18.00
1.41
0.08
No. of BSUs
68.00
0.00
0.00
55.50
4.95
0.09
Crash Intensity
347.00
28.28
0.08
2475.00
540.23
0.22
6.64
0.01
0.00
5.43
0.49
0.09
Length of Hot Zones
99%
Mean
UW
Standard
deviation
Mean
CW
Standard
deviation
CV
CV
No. of Hot Zones
4.50
3.54
0.79
1.00
1.41
1.41
No. of BSUs
12.00
11.31
0.94
2.00
2.83
1.41
Crash Intensity
Length of Hot Zones
88.00
82.02
0.93
300.00
424.26
1.41
1.19
1.11
0.94
0.19
0.26
1.41
5.4.2 Incorporation of the Surrounding Environment of
Crashes Involving Pedestrians
Table 5.17 summarizes the hot zones identified with the threshold value
determined by incorporating surrounding environment of traffic crashes
involving pedestrians. It is observed that the cost-weighted method identified
more hot zones than unweighted method with larger number of casualties and
longer length. Taking the Base-Model (BM) as an example, the cost-weighted
160
approach identified 48 hot zones with 19.19 km in the period from 2002 to
2004 but the unweighted approach detected only 39 hot zones with total
length equal to 15.56 km. Comparing the Base-Model with the Full-Model
(FM) approaches, one may find that the Full-Model approach could detect
more hot zones than the Based-Model approach, especially with cost-weighted
hot zones. For instance, the numbers of Based-Model hot zones were 48 (19.19
km) in 2002-2004 and 50 (20.79 km) in 2005-2007 whereas the numbers of
Full-Model hot zones were 66 (23.84 km) and 60 (23.52 km) respectively when
the cost-weighted approach is employed.
Table 5.17 Characteristics of hot zones identified by incorporating surrounding environment
2002-2004
Hot Zone Type
Unweighted
Base Model Full Model
Cost-weighted
Base Model
Full Model
39
44
48
66
Length (km)
15.56
14.18
19.19
23.84
No. of BSUs
167
150
206
250
No. of Pedestrian Casualties
595
535
649
677
-
-
4390
4829
No. of Hot Zones
Cost
2005-2007
Hot Zone Type
Unweighted
Base Model Full Model
Cost-weighted
Base Model
Full Model
35
45
50
60
Length (km)
15.25
15.07
20.79
23.52
No. of BSUs
164
159
221
246
No. of Pedestrian Casualties
537
503
624
624
-
-
5067
4878
No. of Hot Zones
Cost
161
To further analyze the spatial distribution of hot zones, the unweighted
and cost-weighted hot zones are first overlaid. Table 5.18 summarizes the hot
zones identified only by the unweighted (UW-only) or the cost-weighted
approach (CW-only). In both periods, there were no UW-only hot zones but
some CW-only hot zones no matter whether the Base Model or the Full Model
was employed to determine the threshold value. Taking the Base-Model hot
zones in the period from 2002 to 2004 as an example, none of UW hot zones
belonged to the UW-only type, but 18.75% of CW hot zones could not be
compatible with UW hot zones. This reflects that hot zones identified by the
unweighted method could also be identified by the cost-weighted approach
but not all of cost-weighted hot zones can be identified by the unweighted
approach. Hence, one may only use the cost-weighted approach to identify
hazardous road locations for pedestrians first and then investigate the detailed
information on each type of pedestrian injury if s/he is interested in not only
the concentration of pedestrian casualties of all injury type but also the clusters
of pedestrian fatalities and serious injuries. The following analysis will be
based on the cost-weighted hot zones only.
162
Table 5.18 Link-attribute hot zones identified only by the unweighted or the cost-weighted
method
2002-2004
Base Model
Full Model
UW-only
CW-only
UW-only
CW-only
0 (0%)
9 (18.75%)
0 (0%)
21(31.82%)
2005-2007
Base Model
Full Model
UW-only
CW-only
UW-only
CW-only
0 (0%)
15 (30.00%)
0 (0%)
18 (30%)
To examine the difference between the Base-Model and the Full-Model
hot zones, the two types of hot zones are overlaid by GIS. Table 5.19
summarizes the hot zones identified only by the Base-Model (BM-only) or the
Full-Model approach (FM-only). The shares of the BM-only and FM-only hot
zones demonstrate that most of BM and FM hot zones were compatible.
Nevertheless, there were still 3 BM-only and 13 FM-only hot zones in the
period from 2002 to 2004 and 4 BM-only and 12 FM-only hot zones in the
period of 2005-2007. Which locations (see hot zones in Figure 5.7 ) were more
dangerous? While those who were more concerned with locations with higher
concentration of pedestrian casualties would pay greater attention to the
BM-only hot zones (colored in purple), engineers who would like to detect
pedestrian hot zones which were not identified due only to the nearby road
junctions might dedicate more efforts to FM-only hot zones (colored in pink).
163
Table 5.19 Link-attribute hot zones identified only by the base-mode or the full-model
approach
2002-2004
2005-2007
BM-only
FM-only
BM-only
FM-only
3 (6.25%)
13 (19.70%)
4 (8%)
12(20%)
(a)
(b)
Figure 5.7 FM-only and BM-only cost-weighted hot zones in (a) 2002-2004 and (b) 2005-2007
164
Finally, the stability of the performance of the Base-Model and
Full-Model approaches is examined by comparing the hot zones of two periods.
As shown in Table 5.20, most CVs with Full-Model hot zones were smaller
than those with Base-Model hot zones except the indicator of number of hot
zones. To further investigate the performance, the hot zones of two periods are
overlaid by GIS and the results are shown in Table 5.21. It is found that 68.75%
of 2002-2004 and 66% of 2005-2007 hot zones could be compatible with the
Base-Model approach and the shares reached 69.70% and 76.67% with the
Full-Model approach. Considering the CVs and the shares of HZ24-cum-HZ57,
the performance of the Full-Model approach can be regarded more stable than
that of the Base-Model approach.
Table 5.20 Variation of BM and FM hot zones between two periods
Mean
Base Model
Standard
deviation
CV
Mean
Full Model
Standard
deviation
CV
No. of Hot Zones
49.00
1.41
0.03
63.00
4.24
0.07
No. of BSUs
213.50
10.61
0.05
248.00
2.83
0.01
Costs
4728.50
478.71
0.10
4853.50
34.65
0.01
19.99
1.13
0.06
23.68
0.23
0.01
Length of Hot Zones
165
Table 5.21 Numbers and percentages of BM and FM hot zones identified in both periods
Base Model
Full Model
33
68.75%
46
66%
69.70%
76.67%
Notes:
1. Percentages of HZ24 in the HZ24 -cum- HZ57 category are typed in italics
2. Percentages of HZ57 in the HZ24 -cum- HZ57 category are underlined
5.4.3 Empirical Bayes
The results of EB-based identification of pedestrian hazardous road
locations are shown and discussed in this subsection. In particular, the results
are compared with those with crash intensity defined by the observed counts
(OC).
5.4.3.1 Simple ranking
Table 5.22 summarizes the results on pedestrian hot zones with crash
intensity defined by the EB estimate and the observed count of pedestrian
casualties when the simple ranking method is used to define the threshold
value. It can be observed that in both periods, the EB-based approach
identified slightly smaller number and shorter length of hot zones with less
BUSs and pedestrian casualties than the OC-based approach did, when the
threshold value was determined by 90% and 95% percentiles of crash intensity.
From Table 5.23 which shows the number and percentage of hot zones
identified by one approach only, one may observe that the number of EB-only
pedestrian hot zones was zero when 90% and 95% percentiles were used to
166
define the threshold values, suggesting that all the pedestrian hot zones
identified by the EB-based approach could also be identified by the OC-based
approach. However, when the 99% percentile, in other words, a very high
value was assigned as the threshold value, the situation became complicated.
As shown in Table 5.22, the EB-based approach detected longer pedestrian
hazardous road locations with more BSUs and pedestrian casualties in the
period of 2002-2004, but in the period from 2005 to 2007, the OC-based
approach detected more. In addition, not all of hot zones were compatible in
2002-2004. There was one hot zone detected by the EB-based approach only
(See Table 4.23).
Table 5.22 Statistics on link-attribute pedestrian hot zones with crash intensity defined by the
EB estimate and observed counts (simple ranking)
EB estimate (EB)
2002-2004
HZ type
2005-2007
90%
95%
99%
90%
95%
99%
31
14
2
33
14
5
12.47
5.19
0.60
12.55
4.78
1.28
Number of BSUs
126
53
6
127
49
13
Number of pedestrian casualties
505
305
48
466
274
108
Number of HZs
Length of HZs (km)
Observed count (OC)
2002-2004
HZ type
2005-2007
90%
95%
99%
90%
95%
99%
34
18
2
33
19
7
14.15
6.64
0.40
13.87
6.63
1.97
Number of BSUs
145
68
4
142
68
20
Number of pedestrian casualties
564
367
30
505
327
146
Number of HZs
Length of HZs (km)
167
Table 5.23 Link-attribute hot zones identified only by the EB or the OC approach (simple
ranking)
2002-2004
90%
EB-only
0
(0%)
95%
OC-only
4
(11.76%)
EB-only
OC-only
0
5
(0%)
(27.78%)
2005-2007
90%
EB-only
0
(0%)
99%
EB-only
1
(50%)
95%
OC-only
1
(3.03%)
EB-only
0
(0%)
OC-only
1
(50%)
99%
OC-only
5
(26.32%)
EB-only
0
(0%)
OC-only
2
(28.57%)
In order to investigate the stability of the performance of the two
approaches, the hot zones of two periods are compared. Table 5.24 shows the
variation of hot zones in terms of the number and length of hot zones, the
number of BSUs and pedestrian casualty counts. When 90% and 95%
percentiles were used to define the threshold values, it is difficult to tell which
approach is more stable. Using 90% as an example, there was greater
variability between EB-based hazardous road locations in terms of the number
of hot zones, as demonstrated by the CV values (0.04 in EB-based hot zones
and 0.02 in OC-based hot zones). However, if the length of hot zones is
considered, the EB-based approach was found more stable with CV value equal
to 0.00. If one targets hot zones with the threshold value defined by the 99%
percentile, the performance of the EB-based approach was significantly stable
than that of the OC-based approach. In this sense, the EB-based approach is
desirable if one is interested in hot zones with extremely “hazardous” road
168
locations.
Table 5.24 Variation of EB and OC hot zones between two periods (simple ranking)
90%
Mean
EB estimate
Standard
deviation
CV
Observed count
Mean
Standard
deviation
CV
No. of Hot Zones
32.00
1.41
0.04
33.50
0.71
0.02
No. of BSUs
126.50
0.71
0.01
143.50
2.12
0.01
Crash Intensity
485.50
27.58
0.06
534.50
41.72
0.08
Length of Hot Zones
12.51
0.06
0.00
14.01
0.20
0.01
95%
Mean
No. of Hot Zones
EB estimate
Standard
deviation
CV
Mean
Observed count
Standard
deviation
CV
14
0
0.00
18.50
0.71
0.04
No. of BSUs
51.00
2.83
0.06
68.00
0.00
0.00
Crash Intensity
289.50
21.92
0.08
347.00
28.28
0.08
4.99
0.29
0.06
6.64
0.01
0.00
Length of Hot Zones
99%
Mean
EB estimate
Standard
deviation
CV
Mean
Observed count
Standard
deviation
CV
No. of Hot Zones
3.50
0.71
0.20
4.50
3.54
0.79
No. of BSUs
9.50
2.83
0.30
12.00
11.31
0.94
Crash Intensity
78.00
28.99
0.37
88.00
82.02
0.93
Length of Hot Zones
0.94
0.29
0.31
1.19
1.11
0.94
5.4.3.2 Safety potential
Table 5.25 summarizes the results on pedestrian hot zones with crash
intensity defined by the EB estimate and the observed count of pedestrian
casualties when the “potential crash reduction” method is used to define the
threshold value. It can be observed that the EB-based and OC-based
169
approaches could identify the same number and length of hot zones, the same
number of BSUs and the same number of pedestrian casualties as well.
Moreover, they were 100% compatible, as demonstrated in Table 5.26.
Table 5.25 Statistics on link-attribute pedestrian hot zones with threshold value defined by EB
estimate and observed pedestrian casualty count (safety potential)
2002-2004
2005-2007
HZ type
EB
OC
EB
OC
Number of HZs
44
44
45
45
14.18
14.18
15.07
15.07
Number of BSUs
150
150
159
159
Number of pedestrian casualties
535
535
503
503
Length of HZs (km)
Table 5.26 Link-attribute hot zones identified only by the EB or the OC approach (safety
potential)
2002-2004
EB-only
0
(0%)
2005-2007
OC-only
0
(0%)
EB-only
0
(0%)
OC-only
0
( 0%)
5.5 Summary
This chapter employed casualty-weighted methods, including unweighted
and cost-weighted approaches to identify hot zones in Kwun Tong District.
In particular, the threshold values were defined by the expected crash
intensity which was calculated by negative binomial regression models.
When the simple ranking method was used to determine the threshold
value, the performance of the unweighted approach was more stable than
that of the cost-weighted approach. Nevertheless, the choice of the approach
170
still depends on the targeted injury type of pedestrian casualties.
When the threshold value was determined by incorporating surrounding
environmental variables of pedestrian casualties, the hot zones detected by
the unweighted approach could also be identified by the cost-weighted
approach. The performance of the Full-Model approach is more stable.
Nonetheless, the Base-Model hot zones are also important in addressing
pedestrian safety issues such as the allocation of medical resources.
In general, the hot zones identified by the EB-based approach and the
OC-based approach are similar, but the EB-based approach is found superior
to the OC-based approach in the stability of the performance when the
threshold value is defined with a relatively high value.
171
CHAPTER 6
EVENT-BASED ANALYSIS FOR ROAD
CRASHES
This chapter performs event-based analysis for road crashes. The hot
zones are identified with threshold value determined by numerical definition
and a Monte Carlo method. In particular, the spatial pattern of road crashes is
also analyzed by taking into consideration the impacts of traffic volume on the
spatial distribution of road traffic crashes. The data sources and analytical
procedures are presented in details in Section 6.1.1 and 6.1.2 respectively.
After discussing the event-based results in Section 6.1.3, a comparison with
link-attribute analysis is made in Section 6.2.
6.1 Territory-Wide Identification of Hazardous Road
Locations
6.1.1Data Description
The ATC road networks (1,060 km) of 2002-2004 and 2005-2007 are used
for territory–wide analysis of road crashes in this chapter. There were 31,324
and 30,511 road crashes happening on ATC roads during the period from 2002
to 2004 and the period from 2005 to 2007 respectively. The interval of the
reference points is defined as 100 meters. With the dissolved ATC road
173
network, 11,628 reference points were obtained.
6.1.2Data Analysis
Numerical definition and Monte Carlo method will be used in the
identification of hazardous road locations of the whole territory. In particular,
spatial patterns of road crashes are also analyzed by taking into consideration
the traffic volume.
6.1.2.1 Numerical definition
Firstly, arbitrary numbers are used in this chapter to analyze spatial
distribution of road crashes. Consistent with the link-attribute analysis, the
threshold values are assigned as 9, 12 and 15 traffic crashes in a three-year
period. The cut-off values for a hot zone are then expected as 18, 24 and 30
road crashes in three years.
6.1.2.2 Monte Carlo simulation
Crash frequency
Monte Carlo simulation method is used to determine the threshold values
for crash frequency. Assuming there are m (m=31,324 in 2002-2004 and
=30,511 in 2005-2007) road crashes during a time interval t (t=3 years), the
general steps for the identification of hot zones include:
174
(1) For the crash pattern, calculate ECI (Equation 3.7 and 3.8) at each RP
as actual crash intensity;
(2) Randomly select m out of 11,628 reference points as one simulated
road crash pattern and calculate the crash intensity, denoted as ECI(sim), at each
RP according to Equations 3.7 and 3.8;
(3) Repeat Step 2 1000 times;
(4)For each reference point, the 50th (95% significance level) largest ECI(sim)
is used as the threshold value.
(5) Identify hot zones by following Equation 3.9 -3.11.
To investigate the sensitivity of hot zones to the threshold value, the
significance level in Step 4 is also defined as 99% and 99.9%.
Crash risk- incorporation of traffic volume
Taking into consideration traffic volume (AADT), the threshold values
can be defined by modifying the Monte Carlo simulation. The probability to
allocate a road crash to a RP is determined by the AADT. Assuming there are
m road crashes during a period of time, the general steps include:
(1) For the crash pattern, calculate ECI (Equation 3.7 and 3.8) at each RP
as actual crash intensity;
(2) Collect traffic exposure (AADT) at the location of each RP.
175
(3) With the RPs, simulate a pattern of m road crashes. The probability to
select a RP as a road crash is determined by the AADT at that RP.
(4) Calculate the crash intensity ECI(sim) at each RP according to Equation
3.7 and 3.8;
(5) Repeat Step 3 and 4 1,000 times.
(6)For each reference point, the 50 th (95% significance level) largest ECI(sim)
is used as the threshold value.
(7) Identify hot zones by following Equations 3.9 -3.11.
To investigate the sensitivity of hot zones to the threshold value, the
significance level in Step 6 is also defined as 99% and 99.9%.
Assuming that there are n reference points, in order to implement Step 2,
the reference points are first labeled as 1, 2, 3 …n. Every reference point has
AADT information, as shown in the example of Table 5.2. Then, a variable
“Interval” is created based upon the AADT information in an “accumulation”
manner (see Table 6.2). The lower bound (“Lower”) and the upper bound
(“Upper”) of the interval for reference point i can be calculated as:
176
1 if i  1

Loweri   i1
1   AADTj if i  1
 j 1
(6.1)
i
Upperi   AADTj
(6.2)
j 1
where i, j=1,2,3…n and AADTj is the AADT at the location of reference point
j.
Table 6.1 Illustration of reference points and AADT information
Reference Point No.
AADT
1
800
2
1230
3
12567
4
3245
5
6558
…
…
n-1
12432
n
790
Table 6.2 Illustration of “Interval” variable
Reference Point No.
AADT
Interval [Lower, Upper]
1
800
[1,800]
2
1230
[801, 2030]
3
12567
[2031, 14597]
4
3245
[14598, 17842]
5
6558
[17843, 24401]
…
…
…
n-1
12432
[234532, 246963]
n
790
[246964, 247753]
To assign a road crash to a reference point, a random value is generated
between 1 and the upper bound of the interval of reference point n (247,753 in
177
Table 6.2). The reference point with an interval in which the random number
falls will be selected as the simulated road crash. For instance, if the random
value is generated as 8,000 by the computer, a road crash will be created at the
location of Reference point 3. In this way, a simulated road crash is more likely
to be allocated to a reference point with greater AADT.
6.1.3
Results
6.1.3.1 Numerical definition
Table 6.3 shows the statistics on event-based hot zones with threshold
values determined by an arbitrary number. When the threshold values are
defined as 9, 12, 15 road crashes in three years, the numbers of hot zones
identified by the event-based approach were 324, 243 and 174 in the period of
2002-2004, and 292, 198 and 138 in the period from 2005 to 2007, respectively.
If individual hot zones are examined, the length of hot zones greatly varied
especially when the threshold value is small. For instance, the minimum and
maximum lengths of HZ 18+ were 0.12 km and 6.77 km with CV equal to 1.39 in
2002-2004. If one investigates the road crashes happening on each hot zone,
one may detect some hot zones with small numbers of road crashes which
were less than the defined critical values. As shown in Table 6.3, the minimum
numbers of road crashes occurring on hot zones of HZ18+, HZ24+ and were 9, 12,
15, only half of the critical values. As one of the important characteristics of
178
the arbitrary-number definition is that it enables users easily control the
critical value of a hot zone, the hot zones with crash intensity less than the
critical value are undesirable. Table 6.4 summarizes the characteristics of these
hot zones. The “undesirable” hot zones accounted for a large share and the
percentages significantly rose with increased threshold values. Nearly 30% of
HZ18+ hot zones had road crashes less than 18 in either the 2002-2004 or the
2005-2007 period, and 57.47% and 47.83% of HZ30+ hot zones had the number
of road crashes smaller than 30 in the periods from 2002 to 2004 and from
2005 to 2007 respectively. The length of these hot zones was short with mean
value no more than 200 meters. The maximum length of hot zones was only
370 meters. Looking into the locations of these hot zones on the map, one may
observe that these tiny hot zones were all located around road junctions, as
illustrated in Figure 6.1 which delineates the locations of undesirable HZ18+ hot
zones in part of Hong Kong in the period from 2002 to 2004. Since the
arbitrary-number approach identifies a large number of undesirable hot zones,
it may be not very appropriate to apply this method to the identification of
hazardous road locations.
179
Table 6.3 Event-based hot zones with threshold values defined by an arbitrary number
2002-2004
HZ type
2005-2007
HZ18+
HZ24+
HZ30+
HZ18+
HZ24+
HZ30+
Number of HZs
324
243
174
292
198
138
Number of BSUs
1136
723
473
1060
643
418
107.33
69.37
45.74
100.68
61.94
40.12
Minimum
0.12
0.12
0.12
0.11
0.13
0.13
Maximum
6.77
1.82
1.71
5.20
3.99
1.88
Mean
0.33
0.29
0.26
0.34
0.31
0.29
Standard deviation
0.46
0.22
0.20
0.46
0.35
0.24
1.39
0.76
0.77
1.35
1.13
0.83
11845
8680
6452
11131
7983
5842
Minimum
9
12
15
9
12
15
Maximum
Length (km)
Total
CV
Road Crashes
Total
999
327
309
661
505
321
Mean
36.56
35.72
37.08
38.12
40.32
42.33
Standard deviation
62.63
35.59
35.83
60.20
50.46
40.70
CV
1.71
1.00
0.97
1.58
1.25
0.96
Table 6.4 Statistics on event-based hot zones with crash intensity less than the critical value
2002-2004
HZ type
2005-2007
HZ18+
HZ24+
HZ30+
HZ18+
HZ24+
HZ30+
94
(29.01%)
95
(39.09%)
100
(57.47%)
86
(29.45%)
79
(39.89%)
66
(47.83%)
192
200
209
180
164
137
Total
17.53
18.41
19.25
16.77
15.36
12.89
Minimum
0.14
0.13
0.12
0.11
0.14
0.14
Maximum
0.28
0.37
0.37
0.37
0.34
0.33
Mean
0.19
0.19
0.19
0.19
0.19
0.20
Standard deviation
0.02
0.03
0.03
0.04
0.04
0.03
CV
0.11
0.16
0.16
0.21
0.21
0.15
Number of HZs
Number of BSUs
Length
180
Figure 6.1 Illustration of locations of undesirable hot zones
6.1.3.2 Monte Carlo simulation
Using 95%, 99% and 99.9% percentiles of the simulated patterns of road
crashes as threshold values, the hot zones are identified and the results are
summarized in Table 6.5. The numbers of hot zones with crash intensity
determined by crash frequency (CF) are significantly different with that by
crash risk (CR). Given the same significance level, the crash-risk approach can
detect more hazardous road locations. Using 99% significance level as an
example, there were 296 and 261 CF-based hot zones in the periods of
2002-2004 and 2005-2007, whereas the numbers of CR-based hot zones
181
reached 423 and 397 respectively. Focusing on characteristics of individual hot
zones, one may observe that there was much variation among CR-based hot
zones in terms of number of road crashes and length of hot zones. When the
95% significance level is used to define the threshold value, the number of
road crashes in a CR-based hot zone ranged from 1 to 1,270 with CV equal to
2.49 in the period of 2002-2004 and the minimum and maximum length of a
hot zone were 0.12 km and 10.51 km respectively. In addition, it should be
noticed that some of CR-based hot zones might have very limited number of
road crashes happening on them. The minimum number of road crashes in a
hot zone could be 1 (95% CR-based hot zones in 2002-2004) or 2 (99.9%
CR-based hot zones in 2002-2004 and 95% CR-based hot zones in 2005-2007).
Looking into these locations, it is found that the traffic volume was very small.
Even a slight change of traffic volume can cause great variation. These
locations are more likely to be affected by the regression-to-the-mean problem.
Therefore, one should be very careful in treating these locations as road
hazards.
182
Table 6.5 Statistics on hot zones based on statistical definition
2002-2004
Hot Zone Type
CF
CR
CF
CR
99.9%
CF
CR
Number of Hot Zones
363
492
296
423
253
367
324
487
261
397
207
321
Number of Hot RPs
Road Crashes on Hot Zones
Total
Minimum
Maximum
Mean
1381
2045
998
1632
774
1327
1270
2110
915
1586
689
1234
13,967
7
1166
38.40
16,024
1
1270
32.56
11,411
9
1015
38.55
13,887
4
1,120
32.82
9,554
9
955
37.76
12,338
2
593
33.60
12,987
7
917
40.08
15,645
2
1091
32.12
10,516
8
893
40.29
13,230
3
803
33.32
8,786
9
641
42.44
11,370
4
775
35.42
70.91
81.14
63.66
75.06
62.53
48.92
72.32
78.19
73.54
64.82
61.87
63.01
1.85
2.49
1.65
2.29
1.66
1.46
1.80
2.43
1.83
1.95
1.46
1.78
131.78
0.12
190.50
0.12
96.51
0.12
153.13
0.12
75.21
0.12
125.27
0.12
121.70
0.10
196.68
0.10
88.51
0.12
149.07
0.10
67.35
0.13
116.79
0.10
Maximum
Mean
Standard deviation
8.40
0.36
0.54
10.51
0.39
0.80
6.64
0.33
0.43
8.62
0.36
0.69
6.08
0.30
0.40
4.16
0.34
0.42
6.78
0.38
0.57
11.82
0.40
0.78
6.33
0.34
0.55
6.41
0.37
0.57
4.45
0.33
0.43
6.07
0.36
0.51
CV
1.50
2.05
1.30
1.92
1.33
1.24
1.50
1.95
1.62
1.54
1.30
1.42
Standard deviation
CV
Length (km)
Total
Minimum
95%
2005-2007
99%
183
95%
99%
CF
CR
CF
CR
99.9%
CF
CR
Figure 6.2 delineates the spatial distribution of CF-based and CR-based
hot zones using 99% percentile as the threshold value by time period. In both
2002-2004 and 2005-2007 periods, most hot zones were located in Hong Kong
Island and Kowloon Peninsula regardless of whether CF or CR approach was
used. However, there was some difference between CF-based and CR-based
hot zones. A typical example could be the Lantau Island (I) which is a remote
and sparsely populated place located in the southwest of Hong Kong. No hot
zones were identified on the island in either period when crash frequency was
used as the measurement of crash intensity. However, 3 hot zones were
detected in 2002-2004 and 2005-2007 respectively when traffic volume was
taken into consideration. To further investigate the similarity and difference,
the two types of hot zones are overlaid and the hot zones identified by both
approaches and by one approach only are summarized in Table 6.6 by district.
The table records the total number and shares of the three types of hot zones.
Summing up the numbers of CF-cum-CR, CF-only and CR-only hot zones,
there were altogether 451 and 426 hot zones in the whole territory. More than
half of hot zones were located in the urban core area, of which about 55% and
57.33% were identified by both CF-based and CR-based methods in the
periods of 2002-2004 and 2005-2007 respectively. The shares of hot zones
identified only by the CF-based approach were small. Less than 10% of hot
zones belonged to the CF-only type in urban core area in both periods. The
184
figures were even smaller with hot zones in suburb. Only 4.19% and 7.46% of
hot zones were identified by the CF-based approach only. In another words,
most of hot zones identified by the CF-based approach could also be identified
by the CR-based approach, especially for hot zones located in suburb. As
running Monte Carlo simulations requires much computing time, one may
only employ CR-based approach to identify hot zones even if one would also
like to identify hot zones with large number of road crashes. If individual
district is investigated, it is found that most of hot zones identified by the
CF-based approach and hot zones identified by the CR-based approach could
be compatible in the district of Yau Tsim Mong (YTM) in both periods. The
shares of CF-cum-CR hot zones accounted for 86.5% and 83.9% of total hot
zones. Hence, one may choose either the CF-based or the CR-based approach
to identify hazardous road locations in YTM. In addition, five out of eighteen
districts (Tuen Mun, Yuen Long, Sai Kung, Northern and Islands) had no hot
zones identified only by the CF-based approach. For these districts, one may
identify hot zones only with the CR-based approach and then investigate the
crash frequency of hot zones if one is interested in both crash frequency and
crash risk.
185
(a)
(b)
Figure 6.2 Hot zones identified by 99% significance level for (a) crash frequency and (b) crash
risk in the period from 2002 to 2004
186
(a)
(b)
Figure 6.3 Hot zones identified by 99% significance level for (a) crash frequency and (b) crash
risk in the period from 2005 to 2007
187
Table 6.6 Hot zones by district and type based on statistical definition
2002-2004
District
Urban Core
Central & Western (CW)
Wan Chai(WCH)
Eastern (E)
Yau Tsim Mong (YTM)
Sham Shui Po (SSPO)
Kowloon City (KC)
Wong Tai Sin (WTS)
Kwun Tong (KT)
Kwai Tsing (KTS)
Suburb
Southern (S)
Sha Tin (ST)
Tai Po (TP)
Tsuen Wan (TW)
Tuen Mun (TM)
Yuen Long (YL)
Sai Kung (SK)
Northern (N)
Islands (I)
Total
Region
Hong Kong Island
Hong Kong Island
Hong Kong Island
Kowloon Peninsular
Kowloon Peninsular
Kowloon Peninsular
Kowloon Peninsular
Kowloon Peninsular
Kowloon Peninsular
Hong Kong Island
The New Territories
The New Territories
The New Territories
The New Territories
The New Territories
The New Territories
The New Territories
The New Territories
2005-2007
CF-cum-CR
(%)
CF_only
(%)
CR_only
(%)
Total
(No.)
CF-cum-CR
(%)
CF_only
(%)
CR_only
(%)
Total
(No.)
55.02
45.45
35.29
35.48
86.49
65.22
73.81
48.28
60.00
37.50
49.30
12.50
63.89
68.18
32.00
50.00
51.61
52.38
61.90
0.00
9.64
9.09
17.65
19.35
2.70
4.35
7.14
6.90
11.43
9.38
4.19
8.33
2.78
35.34
45.45
47.06
45.16
10.81
30.43
19.05
44.83
28.57
53.13
46.51
79.17
33.33
9.09
60.00
50.00
48.39
47.62
38.10
100.00
249
22
17
31
37
23
42
29
35
32
215
24
36
22
25
34
31
21
21
3
451
57.33
54.55
50.00
61.54
83.87
80.00
70.45
35.71
37.50
36.36
34.83
15.79
60.61
32.26
18.18
28.57
33.33
45.00
40.00
0.00
9.05
4.55
9.09
11.54
6.45
4.00
11.36
14.29
15.63
9.09
7.46
5.26
12.12
22.58
13.64
0.00
0.00
0.00
0.00
0.00
33.62
40.91
40.91
26.92
9.68
16.00
18.18
50.00
46.88
54.55
57.71
78.95
27.27
45.16
68.18
71.43
66.67
55.00
60.00
100.00
232
22
22
26
31
25
44
28
32
22
201
19
33
31
22
28
27
20
20
3
426
22.73
8.00
0.00
0.00
0.00
0.00
0.00
188
To test the stability of the two approaches, hot zones detected in the
period of 2002-2004 are compared with hot zones identified in the period of
2005-2007 in terms of the number of hot zones, number of RPs, the number of
road crashes and the total length of hot zones. The results in Table 6.7 show
that the variation of the 2002-2004 and 2005-2007 hot zones increased with
increasing significance levels regardless of whether CF or CR approach was
used, indicating that the performance of the two approaches became less stable
with increasing threshold values. If the two approaches are compared, the CVs
among CR-based hot zones were smaller than those among CF-based hot zones,
suggesting that the CR-based approach is more stable than the CF-based
approach.
189
Table 6.7 Variation of hot zones (CF and CR) between two periods
95%
Mean
CF
Standard
deviation
CV
No. of Hot Zones
343.50
27.58
0.08
No. of RPs
1325.50
78.49
No. of Crashes
13477.00
126.74
Length of Hot Zones
CR
Standard
deviation
CV
489.50
3.54
0.01
0.06
2077.50
45.96
0.02
692.96
0.05
15834.50
267.99
0.02
7.13
0.06
193.59
4.37
0.02
Mean
99%
No. of Hot Zones
278.50
CF
Standard
deviation
24.75
No. of RPs
956.50
58.69
0.06
1609.00
32.53
0.02
10963.50
632.86
0.06
13558.50
464.57
0.03
92.51
5.66
0.06
151.10
2.87
0.02
Mean
No. of Crashes
Length of Hot Zones
410.00
CR
Standard
deviation
18.38
0.04
CV
Mean
0.09
CV
99.9%
No. of Hot Zones
230.00
CF
Standard
deviation
32.53
No. of RPs
731.50
60.10
0.08
1280.50
65.76
0.05
No. of Crashes
9170.00
543.06
0.06
11854.00
684.48
0.06
71.28
5.56
0.08
121.03
6.00
0.05
Mean
Length of Hot Zones
344.00
CR
Standard
deviation
32.53
0.09
CV
Mean
0.14
CV
6.2 Comparison with Link-attribute Approach
The empirical link-attribute and event-based hot zones for road crashes
are compared in this section by the way in which the threshold values are
defined,
including
the arbitrary-number, Monte-Carlo-simulation and
incorporation-of-road-environment definitions.
190
6.2.1 Numerical Definition-An Arbitrary Number
As discussed in previous section, the event-based approach detected a
large number of undesirable hot zones with crash intensity (the number of
road crashes) less than the critical value. In order to identify more
characteristics of the two arbitrary-number approaches, the following analysis
will exclude those undesirable hot zones and only compare hot zones with
crash intensity equal to or more than the critical value. Table 6.8 summarizes
the characteristics of the link-attribute and event-based arbitrary-number hot
zones in the period from 2002 to 2004. Even though the undesirable hot zones
are excluded, the number of hot zones identified by the event-based approach
was still much higher than that by the link-attribute approach. Taking HZ 18+ as
an example, the link-attribute approach identified 147 hot zones and
event-based approach detected 83 more hot zones. The length of event-based
hot zones was even more than doubled that of link-attribute hot zones.
Focusing on the computing time, one may observe that the event-based
approach was time-consuming. It took the event-based approach 55 hours to
identify the crash hot zones, which was ten hours more than the link-attribute
approach.
191
Table 6.8 Characteristics of link-attribute and event-based arbitrary-number hot zones
(2002-2004)
HZ18+
HZ30+
HZ24+
L
E
L
E
L
E
Number of Hot Zones
147
230
73
148
Number of BSUs/RPs
Crash
423
6,482
944
10,600
191
3,579
523
7,044
36
92
Length (km)
41.98
89.80
19.02
50.96
74
264
4,299
26.49
45
55
45
55
Hot Zone Type
Computing time (h)
2,084
9.15
45
55
When the link-attribute and event-based hot zones are overlaid, the hot
zones can be classified into hot zones identified by both approaches (L-cum-E),
hot zones identified by the link-attribute only (L-only) and hot zones detected
by the event-based only (E-only). Table 6.9 summarizes comparison results on
HZ18+ hot zones in the period from 2002 to 2004. The numbers of L-only and
E-only hot zones were 16 and 113 respectively. If one closely looks into hot
zones on the map, one may observe that most of E-only hot zones were located
around road junctions (see Figure 6.4 as an example). In addition, the crash
intensity per hot zone was similar with the two approaches, but the crash
intensity per km was greater with L-only hot zones. The minimum length of
L-only hot zones was 0.19 m but it was only 0.12 m with E-only hot zones,
indicating that the event-based approach may identify more short hot zones
with smaller crash intensity per km.
192
Table 6.9 Comparison of link-attribute and event-based hot zones (HZ18+)
Hot Zone Type
L-cum-E
L_only
E_only
103
16
113
7899
457
3203
Minimum
19
18
18
Maximum
1219
54
70
Mean
76.68
28.56
28.35
Standard deviation
125.07
9.27
10.90
1.63
0.32
0.38
Total
55.37
3.37
29.07
Minimum
0.14
0.19
0.12
Maximum
6.77
0.30
0.94
Mean
0.54
0.21
0.26
Standard deviation
0.76
0.03
0.12
CV
1.41
0.14
0.46
142.7
135.6
110.2
Number of Hot Zones
Crash
Total
CV
Length (km)
Number of road crashes per km
Figure 6.4 E-only hot zones
193
Table 6.10 shows the statistics of link-attribute and event-based hot zones
for both periods. The performance of the link-attribute method is more stable
when the threshold value is equal to 9 and 12 road crashes in three years, but
less stable when the threshold value is defined as 15 road crashes, indicating
that the performance of the two approaches is closely related with the
threshold value.
Table 6.10 Statistics of link-attribute and event-based hot zones (HZ18+) for two periods
HZ18+
Mean
L
Standard
deviation
CV
Mean
E
Standard
deviation
CV
No. of Hot Zones
147.50
0.71
0.00
218.00
16.97
0.08
No. of BSUs/RPs
420.00
4.24
0.01
912.00
45.25
0.05
No. of Crashes
6373.50
153.44
0.02
10320.00
395.98
0.04
41.55
0.61
0.01
86.86
4.16
0.05
Length of Hot Zones
HZ24+
No. of Hot Zones
74.50
L
Standard
deviation
2.12
No. of BSUs/RPs
184.00
9.90
0.05
501.00
31.11
0.06
No. of Crashes
3469.00
155.56
0.04
6843.00
284.26
0.04
18.32
1.00
0.05
48.77
3.10
0.06
Mean
Length of Hot Zones
133.50
E
Standard
deviation
20.51
0.15
CV
Mean
0.03
CV
HZ30+
Mean
L
Standard
deviation
CV
Mean
E
Standard
deviation
CV
No. of Hot Zones
37.00
1.41
0.04
73.00
1.41
0.02
No. of BSUs/RPs
90.00
2.83
0.03
272.50
12.02
0.04
2032.00
73.54
0.04
4374.50
106.77
0.02
8.96
0.28
0.03
26.86
0.52
0.02
No. of Crashes
Length of Hot Zones
194
In summary, the stability of the two approaches depends on the threshold
value. As the event-based approach can identify more hot zones no matter
which threshold value is used, one may employ the event-based approach to
obtain hazardous road locations in order to avoid “false negative” locations.
However, one should be very careful in dealing with those hot zones with
crash intensity less than the critical value.
6.2.2 Monte Carlo Simulation on Crash Frequency
Similar with the numerical-definition results, the number of hot zones
identified by the event-based approach was also significantly higher than that
by the link-attribute approach when Monte Carlo method was used to define
the threshold value. As shown in Table 6.11, the average number of 95% hot
zones (HZ95%) identified by the event-based approach was 343.5 with 126.7 km
long, and the number of hot zones detected by the link-attribute approach was
only 245.5 with 79.7 km in total length. However, if one examines the
variability of the hot zones, one may observe that the values of CV with
link-attribute hot zones were smaller than those with the event-based hot
zones regardless of the significance level. It suggests that the performance of
the link-attribute approach is more stable than that of the event-based
approach. In this sense, the link-attribute approach is preferable if one would
like to identify hot zones by using the Monte Carlo approach to define the
195
threshold value. Recall that the results on hypothetical road network
demonstrate that the link-attribute Monte Carlo simulation is more stable on
grid network. As a great number of road crashes were located in urban area
with roads resembling the grid structure, the findings on empirical road
network is consistent with hypothetical road network.
Table 6.11 Statistics of link-attribute and event-based hot zones for two periods by significance
level
HZ95%
Mean
L
Standard
deviation
CV
Mean
E
Standard
deviation
CV
No. of Hot Zones
245.50
2.12
0.01
343.50
27.58
0.08
No. of BSUs/RPs
810.50
12.02
0.01
1325.50
78.49
0.06
10120.00
56.57
0.01
13477.00
692.96
0.05
79.65
0.86
0.01
126.74
7.13
0.06
CV
No. of Crashes
Length of Hot Zones
HZ99%
Mean
L
Standard
deviation
CV
Mean
E
Standard
deviation
No. of Hot Zones
184.50
0.71
0.00
278.50
24.75
0.09
No. of BSUs/RPs
564.00
1.41
0.00
956.50
58.69
0.06
No. of Crashes
7905.50
133.64
0.02
10963.50
632.86
0.06
55.62
0.45
0.01
92.51
5.66
0.06
Length of Hot Zones
HZ99.9%
Mean
L
Standard
deviation
CV
Mean
E
Standard
deviation
CV
No. of Hot Zones
124.00
1.41
0.01
230.00
32.53
0.14
No. of BSUs/RPs
359.00
5.66
0.02
731.50
60.10
0.08
No. of Crashes
5694.00
11.31
0.00
9170.00
543.06
0.06
35.55
0.43
0.01
71.28
5.56
0.08
Length of Hot Zones
196
6.2.3 Incorporation of Road Environmental Variables
As presented in Chapter 4, the link-attribute method introduces road
environmental variables by using a “potential crash reduction” approach to
define the threshold value. The event-based method, however, incorporates
environmental factors by modifying the Monte Carlo procedure, as outlined in
previous section. This sub-section firstly compares the results of link-attribute
and event-based approaches for the identification of hot zones by taking into
consideration the traffic exposure and then discusses the two methods in
identifying hot zones by incorporating more environmental variables.
6.2.3.1 Incorporation of traffic exposure
As the results in Chapter 3 have shown that the performance of Model LA
is more stable than Model LA’, the link-attribute hot zones in the comparison
are identified with Model LA rather than Model LA’. Table 6.12 summarizes
hot zones identified by the two approaches in the period from 2002 to 2004.
The link-attribute approach identified 483 hot zones and the event-based
approach detected 492, 423 and 367 hot zones with 95%, 99% and 99.9%
significance levels respectively. The following analysis will compare the
link-attribute hot zones with 95% event-based hot zones due to similar
number and length of hot zones. Firstly, the two approaches are compared in
terms of computing time. Table 6.12 records the computing time for
197
generating hot zones. The computing time of link-attribute approach for the
identification of hot zones includes computing time for the procedures of
geo-validation of road crashes, calculation of crash intensity, running the
negative binomial model to calculate the threshold value and modeling of the
spatial pattern of BSUs. The computing time for the event-based approach
equals to the sum of time spent on geo-validation of road crashes, running
1,000 replications to calculate the threshold values, and modeling of spatial
patterns. It can be observed that the computing time of event-based approach
was nearly doubled that of the link-attribute approach. Hence, one may
choose the link-attribute approach to identify hot zones in order to save time.
In addition, the relationship between the traffic exposure and the occurrence
of the road crashes may not be linear. With statistical models such as the
negative binomial model, the relationship can be better examined. The
link-attribute approach is hence superior to the event-based approach.
Table 6.12 Statistics of link-attribute and event-based hot zones (incorporation of traffic exposure)
E
Hot Zone Type
L
95%
99%
99.9%
Number of Hot Zones
483
492
423
367
Number of BSUs/RPs
1,859
2,045
1632
1327
Crash
17,027
16,024
13,887
12,338
Length (km)
Computing time (h)
179.05
50
190.5
90
153.13
90
125.27
90
198
Next, the two approaches are compared in terms of stability. From Table
6.13, one may observe that the performance of the event-based approach is
more stable than the link-attribute approach in terms of the number of
BSUs/RPs and length of hot zones, but is similar with the link-attribute
approach when focusing on the numbers of hot zones and crashes.
Table 6.13 Statistics of link-attribute and event-based hot zones for two periods
L
E95%
Mean
Standard
CV
Mean
Standard
deviation
deviation
CV
No. of Hot Zones
480.50
3.54
0.01
489.50
3.54
0.01
No. of BSUs/RPs
1945.50
122.33
0.06
2077.50
45.96
0.02
No. of Crashes
17219.00
271.53
0.02
15834.50
267.99
0.02
187.74
12.29
0.07
193.60
4.38
0.02
Length of Hot Zones
6.2.3.2 Incorporation of other environmental variables
The link-attribute method can introduce many independent variables into
statistical models so as to incorporate more environmental variables such as
road type and road junctions. In this thesis, negative binomial regression is
used to introduce these variables (see Chapter 4). However, for the
event-based approach, neither negative binomial nor Poisson models are
appropriate for incorporating environmental variables because of the double
counting of the event-based approach. Even though the search distance of a
reference point is defined as half of the interval of reference points, the double
counting still exist especially around road junctions. Hence, it is not suitable to
use crash intensities of reference points as samples for modeling the spatial
199
distribution of road crashes. Moreover, it is more difficult for the event-based
approach to determine environmental attributes for reference points. For
instance, reference point A, as shown in Figure 6.5, is located at the
intersection of Road No. 1 (a main road) and Road No. 2 (a secondary road).
An engineer who aims to take into consideration the contributory factor “road
type” may have to assign an attribute value of the road type to the reference
point before any further investigation. But which road is the reference point
located on? Which could be more appropriate as an attribute value, secondary
road or main road? It is quite difficult to tell. Hence, as the link-attribute
approach has significant strengths in incorporating environmental variables,
one may choose the link-attribute approach to identify hazardous road
locations if one would like to incorporate some environmental variables.
Figure 6.5 Illustration of determining an attribute value for a reference point
200
6.3 Summary
This chapter introduces the event-based method for the identification of
hazardous road locations. The hot zones are identified with the threshold
values defined by an arbitrary number and Monte Carlo Simulation. When an
arbitrary number was used, there were a great number of hot zones with crash
intensity less than the critical value. Hence, the arbitrary-number approach is
not appropriate for identifying hazardous road locations if one is concerned
with the critical value of a hot zone. When the Monte Carlo method was used,
the CR-based method was found more stable than the CF-based method. In
this light, CR-based method is preferable if one is concerned with stability of
the performance of the approach. However, as some sectors such as hospitals
are more likely to be interested in the absolute number of road crashes, the
CF-based hot zones may also important to them for addressing problems such
as allocating medical resources. In addition, the result demonstrates that most
CF-based hot zones could also be identified by the CR-based method in some
districts. In order to save efforts, one may only use the CR-based approach to
identify hazardous road locations even if one is interested in both CF-based
and CR-based hot zones.
The event-based approach is compared with the link-attribute approach
in analyzing road crashes. The event-based approach is less likely to cause false
201
negative problems and it is more convenient for the link-attribute approach to
incorporate a set of road environmental variables.
202
CHAPTER 7
EVENT-BASED ANALYSIS FOR ROAD
CASUALTIES
Crash intensity of a reference point in event-based analysis can be
determined not only by crash frequency but also by other methods such as the
casualty-weighted method. The first section presents the way in which hot
zones are identified in a casualty-weighted and event-based manner. The
spatial distribution of crashes involving pedestrian casualties in Kwun Tong
District will be analyzed by using simple ranking method to determine the
threshold value. After discussing the event-based results in Section 7.1.3, a
comparison with link-attribute analysis is made in Section 7.2.
7.1 District-Wide Identification of Hazardous Road
Locations for Pedestrians
7.1.1Data Description
The spatial distribution of pedestrian casualties occurring on the entire
road network (147 km) of Kwun Tong District during the periods of 2002-2004
and 2005-2007 are analyzed. There were altogether 944 pedestrian casualties
(22 fatalities, 213 serious injuries and 709 slight injuries) in the period from
2002 to 2004 and 940 pedestrian casualties (21 fatalities, 242 serious injuries
203
and 677 slight injuries) in the period from 2005 to 2007. Details of data
description on pedestrian casualties and road network can be found in Chapter
5. The interval of reference points is defined as 100 meters. Using the dissolved
road network, 1,890 reference points are finally created.
7.1.2Data Analysis
The pedestrian casualties are analyzed by two casualty-weighted methods,
namely unweighted and cost-weighted approaches.
7.1.2.1 Unweighted
With unweighted denotation, crash intensity ECIUW is calculated by:
m
ECIUWi   fij
(7.1)
j 1
1 if dij  h
fij  
0 otherwise;
(7.2)
where m denotes the number of pedestrian casualties; d ij is the network
distance between RP i and casualty j; and h is the search distance from RP i.
The simple ranking method is used to define the threshold value. In this
chapter, 90%, 95% and 99% of largest values of ECIUW are used to determine
the threshold values for sensitivity analysis.
204
7.1.2.2 Cost-weighted
As discussed in Chapter 5, the values of preventing a serious injury and a
slight injury are around 16% and 1% of the value of a fatality respectively.
Hence, with cost-weighted denotation, the crash intensity ECICW is calculated
by:
m
ECICWi   fij
(7.3)
j 1
100 if dij  h and casualty j is fatal,
16 if dij  h and casualty j is seriously injured ,

fij  
1 if dij  h and casualty j is slightly injured ,
0 otherwise;
(7.4)
where m denotes the number of pedestrian casualties; d ij is the network
distance between RP i and casualty j; and h is the search distance from RP i.
Similar with unweighted analysis, 90%, 95% and 99% of largest values of
ECICW are used to determine the threshold values.
7.1.3 Results
Table 7.1 summarizes the results on event-based pedestrian hot zones. If
the two casualty-based approaches are compared, the statistics in the table
suggest that the unweighted method could identify hot zones with greater
number of all types of casualties, whereas the cost-weighted method identified
those locations with high concentration of fatal or serious injured victims. For
205
instance, when 90% percentile was used, the equal-weighing method could
detect less hot zones than cost-weighted method in terms of both number and
length of hot zones for both periods (31 unweighted hot zones with 16.59 km
vs. 38 cost-weighted hot zones with 17.82 km in the period from 2002 to 2004;
and 44 unweighted hot zones with 18.4 km vs. 55 cost-weighted hot zones
with 20.99 km in the period from 2005-2007). However, the number of all
pedestrian casualties or the slightly injured pedestrian casualties happening on
unweighted hot zones was larger than that on cost-weighted hot zones (576
pedestrian casualties and 437 slight pedestrian injuries on unweighted hot
zones vs. 548 pedestrian casualties and 383 slight pedestrian injuries on
cost-weighted hot zones in 2002-2004; and 542 pedestrian casualties and 390
slight pedestrian injuries on unweighted hot zones vs. 537 pedestrian casualties
and 342 slight pedestrian injuries on cost-weighted hot zones). The difference
is more obvious with increasing threshold values. It demonstrates that the
results of pedestrian hot zones are sensitive to the weighting method, in
accordance with findings in Chapter 5.
206
Table 7.1 Statistics on event-based pedestrian hot zones with threshold value defined by the
simple ranking method
2002-2004
90%
Hot Zone Type
99%
95%
UW
CW
UW
CW
UW
CW
No. of HZs
31
38
20
24
6
5
No. of RPs
174
177
No. of Casualties
No. of Fatalities &
Serious Injuries
Length (km)
576
548
91
387
83
289
18
114
11
57
139
165
90
101
19
19
16.59
17.82
8.88
8.22
1.77
1.16
2005-2007
90%
Hot Zone Type
UW
99%
95%
CW
UW
CW
26
UW
5
CW
8
83
260
15
98
16
56
No. of HZs
44
55
No. of RPs
187
220
No. of Casualties
No. of Fatalities &
Serious Injuries
Length (km)
542
537
23
106
377
152
195
99
107
27
25
18.38
20.99
10.49
7.90
1.77
1.63
To further investigate the difference between the two types of hot zones,
the unweighted and cost-weighted hot zones are overlaid by GIS and the hot
zones are classified into three types, including hot zones identified by both
approaches (UW-cum-CW), hot zones identified only by the unweighted
approach (UW-only) and hot zones detected only by the cost-weighted
approach (CW-only). The type of UW-cum-CW hot zones can be regarded as
the most dangerous locations for pedestrians since they were identified as
hazards with high numbers of fatalities, serious injuries and slight injuries.
Figure 7.1 describes the location of three types of hot zones which were
identified by using 99% percentile to determine the threshold value. The hot
207
zones identified by both unweighted and cost-weighted approaches are
colored in red. From the map, one can observe that the UW-cum-CW hot
zones identified in both periods were located around the Kwun Tong Town
Center, which is an activity center of Kwun Tong. Great pedestrian volume
and lack of pedestrian protection facilities significantly increase the chance of
being involved in a traffic crash. These locations were less likely to be false
positive and could be treated with high priority. Apart from the UW-cum-CW
hot zones, there were some hot zones identified by only one approach. Table
7.2 shows the statistics on hot zones identified only by one approach in the
period from 2002 to 2004. Consistent with previous findings, compared with
the CW-only hot zones, the UW-only hot zones were characterized with
higher number of pedestrian casualties but lower number of fatal and seriously
injured pedestrian casualties. Taking the 90% hot zone type as an example, the
average number of pedestrian casualties on UW-only hot zones was 6.33,
nearly doubled average pedestrian casualty count of CW-only hot zones (3.22)
whereas the mean value of fatalities and serious injuries on CW-only hot zones
were nearly three-fold that on UW-only hot zones. The choice of UW-based
or CW-based approach depends on the motives of the examination.
208
(a)
(b)
Figure 7.1 Hot zones of three types using 99% percentile in (a) 2002-2004 and (b) 2005-2007
209
Table 7.2 Event-based hot zones identified only by unweighted or cost-weighted method
(2002-2004)
90%
UW-only CW-only
No. of Hot Zones
No. of RPs
Casualties
Total
Minimum
Maximum
Mean
Standard deviation
CV
Fatalities &
Serious Injuries
Total
Minimum
Maximum
Mean
Standard deviation
CV
Length (km)
Total
Minimum
Maximum
Mean
Standard deviation
CV
95%
UW-only CW-only
3
7
9
25
7
18
10
26
19
3
11
29
2
7
77
5
19
43
2
6
6.33
3.40
0.54
3.22
1.55
0.48
11
4.57
0.42
4.3
1.1
0.26
2
0
2
0.67
0.94
1.40
16
1
5
1.78
1.23
0.69
14
1
5
2
1.31
0.66
22
1
4
2.2
0.88
0.40
0.88
0.27
0.31
0.29
0.02
0.07
2.70
0.18
0.66
0.30
0.14
0.47
1.90
0.20
0.37
0.27
0.06
0.22
2.55
0.16
0.53
0.25
0.13
0.52
99%
UW-only CW-only
4
3
10
6
71
22
13
30
17.75
7.08
5
9
7.33
1.70
0.40
0.23
9
0
5
2.25
1.92
12
2
7
4
2.16
0.85
0.54
1.04
0.18
0.41
0.26
0.09
0.61
0.18
0.22
0.20
0.02
0.35
0.10
In order to test the stability of the performance of the two approaches, the
hot zones of two periods are compared in terms of the number and length of
hot zones, the number of reference points and the crash intensity represented
as the number of pedestrian casualties for the unweighted approach and the
total costs for the cost-weighted approach. Table 7.3 shows the mean, standard
deviation and coefficient of variation of 2002-2004 and 2005-2007 hot zones.
When 90% and 99% percentiles were used to define the threshold values, the
210
CVs of unweighted hot zones were less than those of cost-weighted hot zones,
but when 95% percentile was used, the CVs of unweighted hot zones were
larger except with the indicator of crash intensity. To further examine the
variability, the two periods of hot zones are overlaid. Table 7.4 shows the
shares of hot zones identified by both periods after the overlay performance. It
could be observed that the equal-weighing method is more stable than the
cost-weighted method. Around 50% of unweighted hot zones detected in the
period of 2002-2004 could also be detected in the period from 2005 to 2007
regardless of the definition of threshold values. Although 59.1% of
cost-weighted hot zones identified in the period of 2002-2004 could also be
detected in the period of 2005-2007, the shares of hot zones identified by both
periods were sharply decreased with increasing threshold values. For example,
when the 99% percentile was used to define the threshold value, only 20% of
cost-weighted hot zones identified in 2002-2004 could be detected in
2005-2007.
211
Table 7.3 Variation of hot zones between two periods
90%
Mean
UW
Standard
deviation
Mean
CW
Standard
deviation
CV
CV
No. of Hot Zones
37.50
9.19
No. of RPs
0.25
46.50
12.02
0.26
180.50
9.19
0.05
198.50
30.41
0.15
Crash Intensity
559.00
24.04
0.04
4502.50
666.80
0.15
Length of Hot Zones
17.49
1.27
0.07
19.41
2.24
0.12
95%
No. of Hot Zones
21.50
UW
Standard
deviation
2.12
No. of BSUs/RPs
98.50
10.61
0.11
83.00
0.00
0.00
Crash Intensity
382.00
7.07
0.02
2926.50
399.52
0.14
9.69
1.14
0.12
8.06
0.23
0.03
Mean
Length of Hot Zones
25.00
CW
Standard
deviation
1.41
0.06
CV
Mean
0.10
CV
99%
Mean
UW
Standard
deviation
CV
No. of Hot Zones
5.50
0.71
No. of BSUs/RPs
16.50
Crash Intensity
Length of Hot Zones
Mean
CW
Standard
deviation
CV
0.13
6.50
2.12
0.33
2.12
0.13
13.50
3.54
0.26
106.00
11.31
0.11
1016.50
478.71
0.47
1.77
0.00
0.00
1.40
0.33
0.24
Table 7.4 Numbers and percentages of hot zones identified in both periods
Unweighted
HZ2490%-cum- HZ5790%
18
58.1%
50.0%
HZ2495%-cum- HZ5795%
10
50.0%
47.83%
Cost-weighted
HZ2499%-cum- HZ5799%
3
50.0%
60.0%
HZ2490%-cum- HZ5790%
HZ2495%-cum- HZ5795%
HZ2499%-cum- HZ5799%
26
9
1
59.1%
54.5%
37.5%
50.0%
20.0%
12.5%
Notes:
1. Percentages of HZ24 in the HZ24 -cum- HZ57 category are typed in italics
2. Percentages of HZ57 in the HZ24 -cum- HZ57 category are underlined
212
7.2 Comparison with Link-attribute Approach
In Chapter 5, hazardous road locations are identified for pedestrians with
threshold value defined by the simple ranking method and by a “potential
crash reduction” approach for taking into consideration the surrounding
environment of pedestrian road crashes. The previous section of this chapter
applied the simple ranking method to the event-based identification of
hazardous road locations for pedestrians. The following comparative analysis
will focus on the hot zones identified with threshold value defined by the
simple-ranking method.
7.2.1 Simple Ranking
Table 7.5 summarizes hot zones for pedestrians of both periods with
threshold value defined by the unweighted and cost-weighted approaches.
Consistent with previous findings, the event-based approach detected more
hot zones than the link-attribute hot zones regardless of percentiles and
casualty-weighted approaches. The difference between the link-attribute and
the event-based approaches is more obvious with hot zones identified by the
cost-weighted approach. Taking the 95% percentile as an example, the
link-attribute approach detected about 33.5 unweighted hot zones (14.01 km)
and 32.5 cost-weighted hot zones (10.56 km), and the event-based approach
identified 37.5 unweighted hot zones (17.49 km) and 46.5 cost-weighted hot
213
zones (19.41 km). Focusing on cost-weighted hot zones, the variability of
link-attribute hot zones were smaller than that of event-based hot zones when
90% and 95% percentiles were used to define the threshold values. For
cost-weighted definition, most CV values of link-attribute hot zones were
greater than those of event-based hot zones, indicating greater variability with
link-attribute hot zones. Comparatively speaking, the performance of the
link-attribute hot zones was more stable when the unweighted method was
used to define the threshold value, whereas the event-based approach
performed stably when the cost-weighted approach was employed.
214
Table 7.5 Statistics of link-attribute and event-based hot zones for two periods (unweighted and cost-weighted)
Unweighted
Cost-weighted
L90%
E90%
L90%
Mean
Standard
CV
Mean
Standard
CV
Mean
Standard
CV
Mean
deviation
deviation
deviation
No. of Hot Zones
No. of BSUs/RPs
No. of Casualties
Cost
Length of HZs
(km)
33.50
143.50
534.50
14.01
Mean
No. of Hot Zones
No. of BSUs/RPs
No. of Casualties
Cost
Length of HZs
Zones (km)
18.50
68.00
347.00
6.64
Mean
No. of Hot Zones
No. of BSUs/RPs
No. of Casualties
Cost
Length of HZs
Zones (km)
4.50
12.00
88.00
1.19
0.71
2.12
41.72
0.20
L95%
Standard
deviation
0.02
0.01
0.08
0.01
37.50
180.50
559.00
17.49
CV
Mean
0.71
0.00
28.28
0.01
L99%
Standard
deviation
0.04
0.00
0.08
0.00
21.50
98.50
382.00
9.69
CV
Mean
3.54
11.31
82.02
1.11
0.79
0.94
0.93
0.94
5.50
16.50
106.00
1.77
9.19
9.19
24.04
1.27
E95%
Standard
deviation
0.25
0.05
0.04
0.07
32.50
109.50
402.00
3862.50
10.56
CV
Mean
2.12
10.61
7.07
1.14
E99%
Standard
deviation
0.10
0.11
0.02
0.12
18.00
55.50
264.00
2475.00
5.43
CV
Mean
0.71
2.12
11.31
0.00
0.13
0.13
0.11
0.00
1.00
2.00
15.00
300.00
0.19
215
6.36
0.71
45.25
419.31
0.11
L95%
Standard
deviation
0.20
0.01
0.11
0.11
0.01
46.50
198.50
542.50
4502.50
19.41
CV
Mean
1.41
4.95
1.41
540.23
0.49
L99%
Standard
deviation
0.08
0.09
0.01
0.22
0.09
25.00
83.00
274.50
2926.50
8.06
CV
Mean
1.41
2.83
21.21
424.26
0.26
1.41
1.41
1.41
1.41
1.41
6.50
13.50
56.50
1016.50
1.40
E90%
Standard
deviation
CV
12.02
30.41
7.78
666.80
2.24
E95%
Standard
deviation
0.26
0.15
0.01
0.15
0.12
1.41
0.00
20.51
399.52
0.23
E99%
Standard
deviation
0.06
0.00
0.07
0.14
0.03
2.12
3.54
0.71
478.71
0.33
0.33
0.26
0.01
0.47
0.24
CV
CV
If the link-attribute and event-based hot zones are overlaid, the
pedestrian hot zones which were only detected by the link-attribute (L-only)
or the event-based (E-only) approach can be identified. Table 7.6 summarizes
L-only and E-only hot zones with the threshold value determined by 99%
percentile for the period from 2002 to 2004. There were only two small L-only
hot zones (0.20 km) with average crash intensity (represented as the number of
pedestrian casualties) equal to 15. The event-based approach detected 6
additional hot zones with length ranging from 0.18 km to 0.53 km and the
crash intensity ranging from 13 to 30, indicating that the event-based approach
detected more hot zones with varied length and crash intensity. For the
cost-weighted hot zone type, the link-attribute approach detected no extra hot
zones but the event-based approach identified some. These hot zones are
further plotted onto maps for closer examination. Figure 7.2 delineates spatial
locations of unweighted L-only and E-only hot zones and Figure 7.3 depicts
the locations of cost-weighted E-only hot zones. Neither of two unweighted
L-only hot zones identified in 2002-2004 could also be detected in 2005-2007,
but three out of six unweighted and one of five cost-weighted E-only hot
zones could be identified in both 2002-2004 and 2005-2007 periods. These four
hot zones, which were unlikely to be “false negative”, are located around
Kwun Tong Town Center. The area is a landmark activity center of Kwun
Tong, which is a major drop-off point in the district (see minibus terminal as
217
an example in Figure 7.4a). It is not uncommon for pedestrians to share the
same thoroughfares with on-road vehicular traffic in the area and the area
lacks facilities such as guardrails to protect pedestrians (see Figure 7.4b). These
locations require immediate countermeasures to improve the safety of
pedestrians. Failing to identify these locations may cause more deaths and
serious injuries. In this sense, the event-based approach might be chosen to
identify hazardous road locations in order to avoid false negative locations.
However, one may notice that three of six unweighted and four of five
cost-weighted E-only hot zones could not be identified in the second period,
which suggests that the event-based approach may cause more serious false
positive problems.
218
Table 7.6 Summary on 99% L-only and E-only hot zones in 2002-2004
Unweighted
Hot Zone Type
Cost-weighted
L_only
E_only
6
L_only
0
E_only
5
2
Total
30
114
0
678
Minimum
14
13
0
114
Maximum
16
30
0
165
Mean
15
19
0
135.6
Standard deviation
1.00
7.4
0
18.23
CV
0.07
0.39
0
0.13
Total
0.40
1.77
0
1.16
Minimum
0.20
0.18
0
0.18
Maximum
0.20
0.53
0
0.37
Mean
0.20
0.29
0
0.23
Standard deviation
0.00
0.13
0
0.07
CV
0.00
0.45
0
0.30
Number of Hot Zones
Crash intensity
Length (km)
219
(a)
(b)
Figure 7.2 Locations of unweighted pedestrian hot zones of (a) L-only and (b) E-only
220
Figure 7.3 Locations of cost-weighted E-only pedestrian hot zones
221
(a)
(b)
Figure 7.4 Site Review using Google Street View (From the crossing of Fu Yan Street and Wut
Wah Street)
222
7.2.2 Incorporation of Surrounding Environmental Variables
The link-attribute analysis in Chapter 5 employed the negative binomial
model to incorporate a range of surrounding environmental variables. It also
applied the regression models to the estimation of the long-term mean of
pedestrian casualty count (EB estimate). However, the event-based approach is
not appropriate for performing such analysis because of its limitations in, as
discussed in Section 6.2, modeling the relationship between environmental
variables and the number of pedestrian casualties.
7.3 Summary
This chapter analyzes pedestrian casualties in an event-based manner.
When the simple ranking method is used to define the threshold value, the
performance of the unweighted method is generally more stable than the
cost-weighted method. In this sense, one may choose the unweighted method
to identify hazardous road locations for pedestrians. However, although most
hot zones could be identified by both unweighted and cost-weighted
approaches, there existed some hot zones which could be identified only by
one approach. While the unweighted approach targets locations with high
concentration of pedestrian casualties regardless of injury type, the
cost-weighted approach identified those with great number of fatalities and
serious injuries. The choice of the casualty-weighed method in identifying
223
hazardous road locations for pedestrians depends on the targeted injury type of
pedestrian casualties.
The event-based approach is compared with the link-attribute approach
in analyzing pedestrian casualties. While the link-attribute approach is less
likely to cause false positive problems, the event-based approach is less likely
to cause false negative problems. Compared with the link-attribute approach,
it is difficult for the event-based approach to incorporate a set of road
environmental variables.
224
CHAPTER 8
CONCLUSION
8.1 Summary of Findings
This research explores the general procedures of the link-attribute and the
event-based approaches in identifying crash hot zones as hazardous road
locations, and investigates the characteristics of the two approaches by
conducting a range of sensitivity analysis on the definition of threshold value
and determination of crash intensity with both simulated and empirical data.
This section will summarize the main findings of the research.
8.1.1 Link-Attribute Approach
This research employed the link-attribute approach to identify hazardous
road locations with both raw-link-node and dissolved road systems. Since
using a dissolved road network can detect more hazardous road locations and
the performance is more stable than using the raw-link-node road network, it
is better to dissolve the road network first before taking any further steps.
The numerical definition and simple ranking approaches can be used to
identify hazardous road locations with the link-attribute approach. Using a
numerical definition to identify hot zones is a simple and effort-saving
approach, but one should be careful in choosing an appropriate value as the
225
threshold value.
Monte Carlo simulation can be employed to avoid selecting bias in
choosing an appropriate number as the threshold value, but it is rather
time-consuming due to the large number of realizations.
The link-attribute approach can employ statistical models to incorporate
environmental variables. The hazardous road locations identified by different
models may differ significantly; therefore, one should be very careful in
choosing confounding variables.
The EB technique can be employed to define the crash intensity instead of
an observed-count (OC) approach. The difference between the EB-based and
OC-based hot zones is significant and the former approach has a great
advantage in regard to the stability of its performance, when the threshold
value is determined as a relatively high value. It can be implied that the use of
an EB estimate may exert more profound impacts on the rank of the hazardous
road locations, which is; however, beyond the scope of this thesis but worthy
of further investigation in future studies.
The length of BSU impacts the results. As the variability is smaller among
100-meter hot zones, the segmentation length is better defined as 100 meters.
226
8.1.2 Event-Based Approach
Numerical and simple-ranking approaches can be used to identify
hazardous road locations with the event-based approach, while the former can
identify a great number of hot zones with crash intensity less than the critical
value of a hot zone.
Monte Carlo simulation can be used to identify hazardous road locations.
This method can also incorporate traffic volume by modifying the simulation
approach and its performance is more stable when the traffic volume is taken
into consideration.
The interval of reference points impacts the results. As the variability is
smaller among 100-meter hot zones, the interval of reference points is better
defined as 100 meters.
8.1.3 Advantages and Drawbacks of the Two Approaches
Based on the comparative analysis, the advantages and drawbacks of the
two approaches are as follows.
227
8.1.3.1 Advantages
Link-attribute approach
The link-attribute approach is less likely to cause false positive problems,
especially around road junctions. A typical example is the territory-wide
identification of hazardous road locations when the threshold value is
determined by an arbitrary number. The event-based approach detected a
large number of crash hot zones around road junctions with crash intensity
less than the critical value while the link-attribute approach did not.
It is convenient for the link-attribute approach to incorporate a set of road
environmental indicators. These variables can be easily incorporated by
establishing statistical models such as negative binomial models, which are
important in defining crash intensity (EB estimate) and threshold value
(expected number of road crashes or pedestrian casualty counts). One may
select the link-attribute approach without hesitation if environmental
variables should be taken into consideration.
Performing the link-attribute approach can also save time. No matter
which method was used to define the threshold value, the link-attribute
approach took less computing time than the event-based approach.
228
Event-based approach
The event-based approach is less likely to cause false negative problems.
As shown in the identification of hazardous road locations for pedestrians, the
event-based approach detected a large number of hot zones around road
junctions that could not be identified by the link-attribute approach. Although
some of them were more likely to be false positive, some did require further
investigation and countermeasures. Hence, one may consider the event-based
approach if they focus more on false negative problems.
8.1.3.2 Drawbacks
Link-attribute approach
While the link-attribute approach is less likely to identify false positive
locations, it is more likely to cause false negative problems. A typical example
is the comparison of the empirical results on pedestrian casualties, in which
the link-attribute approach might fail to detect hazardous road locations which
could be identified by the event-based approach.
Event-based approach
The event-based approach is more likely to cause false positive problems,
especially around road junctions. For instance, when an arbitrary number was
used to determine the threshold value, a large share of hot zones, most of
which were located around road junctions, were found with crash intensity
229
less than the critical value.
It is difficult for the event-based approach to incorporate a set of road
environmental variables by a statistical regression model. For one thing it is
difficult for the event-based approach to obtain appropriate samples; another is
assigning a suitable environmental attribute to a reference point, which is not
an easy task.
Performing the event-based approach requires much more effort.
Regardless of the definition of the threshold value, it took the event-based
approach more time to complete the whole procedure in order to identify
crash hot zones.
8.2 Importance of the Study
The identification of hazardous road locations plays a key role in reducing
road crashes. This research explores the methodological issues on the
link-attribute approach such as segmentation method and definition of
threshold values in identifying crash hot zones as hazardous road locations. In
particular, it employs a model-based approach to define the threshold value by
taking into consideration a set of environmental variables, which has not been
applied to the hot zone identification before. It also develops an event-based
network statistic which can be used for the identification of crash hot zones.
230
By investigating the hot zone methodology, this study can enrich the
theoretical knowledge of the identification of hazardous road locations and
practically provide policy-makers with more information on identifying road
hazards.
In addition, the improvement of road safety requires joint efforts from
different disciplines. This thesis can supplement the multi-discipline
knowledge of road safety by combining both road safety and spatial analytical
methods into the identification of hazardous road locations through a GIS
platform. For instance, in order to reduce short BSUs, this research develops a
GIS-based dissolving algorithm to dissolve the road network before
segmentation performance; the location of the first reference point is
determined by a newly developed GIS-based tool that can select the reference
points in a random manner.
8.3 Limitations of the Study
The limitations of the study are listed as follows:
False positive and false negative
This research examines the stability of the performance of the two
approaches through the comparison of the results for the 2002-2004 and
2005-2007 time periods. The logic behind the comparison is that if a site can
231
be identified as hazardous in both periods, it is less likely to be a false positive
hazard; however, whether the site is truly positive still requires further
examination. For instance, to find out the reason why hot zones in earlier
years (2002-2004) were not hot zones in later years (2005-2007), more
investigations should be conducted to answer the questions like “Were the
hazards removed because roads were no longer in service or because additional
infrastructures were installed?”
Simulations on a hypothetical road network
In Chapter 3, when simulating road crashes on a hypothetical road
network, the characteristics of spatial distribution of road crashes are roughly
categorized into random, dispersed and concentrated patterns. For the
concentrated pattern, this research only regards 200 meters as the length of a
hot zone with 25 road crashes as the crash intensity. However, the
concentration of road crashes may differ by clustering extent (such as crash
intensity on each hot zone) and scale (the length of a hot zone), which may
have impacts on the performance of the two approaches such as the choice of
BSU length and RP interval.
In this research, the road structure is categorized into three types, namely
grid pattern with 24 road junctions, limited access pattern with 12 road
junctions and organic pattern with 6 road junctions. Although the three
232
instances can give some basic ideas on the impacts of road structures, more
realizations, such as different junction density with one particular type of road
structure, may provide more insights on the performance of the two
approaches.
Weighting method
This research implements event-based methodology for the identification
of a hazardous road location by modifying the local K-function approach
which treats road crashes equally within the search distance of a reference
point. However, different weighting methods may have different impacts on
the results. The hot zones identified by the unweighted method, such as the
event-based approach introduced in this thesis, may be different with those
identified by a distance-decay method, such as the kernel density estimation
approach. The characteristics of the event-based methodology for the
identification of hot zones may not be fully explored if only the unweighted
approach is investigated.
This research assigns different weights to casualties of different injury
types according to the records of other countries. It fails to examine the
sensitivity of hot zones to other weighting methods.
Incorporation of “time” dimension
This research focuses on the location of road crashes without considering
233
the “time” dimension which may have impacts on the performance of the two
approaches.
Ranking of hot zones
This research focuses on the identification of hot zones and treats
everyone equally regardless of whether it has the minimum or the maximum
value of crash intensity. However, as resources are limited, it is impractical to
give treatments to every hot zone. In this research, when the Monte Carlo
simulation method was used to define the threshold value, the link-attribute
approach detected 124 hot zones and the event-based approach identified 230
even though the 99.9% significance level was used. This raises an important
question, which hot zone should be treated with top priority? This is very
important to policy-makers, who may have to choose only 10 out of 230 hot
zones for treatment. To answer this important question, one should not only
indentify hot zones, but also rank the locations based on some criterion.
8.4 Further Research Directions
Investigation of false positive and false negative problems
More criteria in identifying false positive and false negative problems
should be explored so as to further examine the robustness of the link-attribute
and event-based hot zones in the identification of hazardous road locations.
234
This may also require further investigation of regression-to-the-mean problem.
More importantly, investigation on hot zones in field work is necessary, which
can help correctly identify “false positive” and “false negative” hot zones.
More realizations of simulated crash patterns and road structures
As the performance of the two approaches may be influenced by the crash
pattern and road structure, more realizations of simulated crash patterns on
more instances of hypothetical road networks will be established for further
exploration on the two approaches.
Sensitivity analysis on weighting methods
Distance-decay methods, such as kernel density estimation, may be
employed to identify hazardous road locations. This effort can enrich the
knowledge of event-based methodology in the identification of hazardous road
locations.
The sensitivity of hot zones for casualties to weighting methods will be
further investigated by assigning different weights in cost-weighted analysis.
Incorporation of “time” dimension
The ways in which the “time” dimension is introduced into the two
approaches will be explored. A 3D-GIS environment will be used for
developing the methods.
235
Ranking of hot zones
The rank of hot zones may differ by criteria. To answer questions like,
“which is the most dangerous road location, the one with the highest value of
crash intensity or the one with the longest length?” A series of sensitivity
analysis can be conducted to investigate the ranking method.
Development of a software package
A software package for both link-attribute and event-based approaches in
identifying crash hot zones will be developed in order that other researchers
and practitioners can conveniently apply these two approaches to the
identification of hazardous road locations.
236
References
Abbas, K. A. (2004). Traffic safety assessment and development of predictive
models for accidents on rural roads in Egypt. Accident Analysis and
Prevention, 36(2), 149-163.
Abbess, C., Jarret, D., & Wright, C. C. (1981). Accidents at blackspots:
Estimating the effectiveness of remedial treatment, with special reference
to the “Regression-to-the-Mean” Effect. Traffic Engineering and Control
22 (10), 535-542.
Abdalla, I. M., Raeside, R., Barker, D., & McGuigan, D. R. D. (1997). An
investigation into the relationships between area social characteristics and
road accident casualties. Accident Analysis and Prevention, 29 (5),
583-593.
Abdel-Aty, M. A., & Radwan, A. E. (2000). Modeling traffic accident
occurrence and involvement. Accident Analysis & Prevention, 32 (5),
633-642.
Afukaar, F. K., & Damsere-Derry, J. (2010). Evaluation of speed humps on
pedestrian injuries in Ghana. Injury Prevention, 16 (Suppl 1), A205-A206.
Aguero-Valverde, J., & Jovanils, P. P. (2009). Bayesian Multivariate Poisson
Lognormal models for crash severity modeling and site ranking.
Transportation Research Record(2136), 82-91.
Aguero-Valverde, J., & Jovanis, P. P. (2006). Spatial analysis of fatal and injury
crashes in Pennsylvania. Accident Analysis and Prevention, 38 (3),
618-625.
Anderson, T. K. (2009). Kernel density estimation and K-means clustering to
profile road accident hotspots. Accident Analysis and Prevention, 41 (3),
359-364.
Ashenfelter, O. (2006). Measuring the Value of a Statistical Life: Problems and
Prospects. The Economic Journal, 116 (510), C10-C23.
237
Bailey, T. (2004). Statistical analysis of spatial point patterns. Second edition.
International Journal of Geographical Information Science, 18 (1),
105-106.
Black, W. R. (1991). Highway accidents: A spatial and temporal analysis.
Transport Research Record, 1318 , 75-82.
Black, W. R. (1992). Network autocorrelation in transport network and flow
systems. Geographical Analysis, 24 (3), 207-222.
Black, W. R., & Thomas, I. (1998). Accidents on Belgium's motorway: A
network autocorrelation analysis. Journal of Transport Geography, 6 (1),
23-31.
Blazquez, C. A., & Celis, M. S. (2012). A spatial and temporal analysis of child
pedestrian crashes in Santiago, Chile. Accident Analysis & Prevention .
Blower, D., Campbell, K. L., & Green, P. E. (1993). Accident rates for heavy
truck-tractors in Michigan. Accident Analysis & Prevention, 25 (3),
307-321.
Brijs, T., Karlis, D., & Wets, G. (2008). Studying the effect of weather
conditions on daily crash counts using a discrete time-series model.
Accident Analysis & Prevention, 40 (3), 1180-1190.
Brüde, U., & Larsson, J. (1993). Models for predicting accidents at junctions
where pedestrians and cyclists are involved. How well do they fit?
Accident Analysis & Prevention, 25 (5), 499-509.
Cameron, A. C., & Trivedi, P. K. (1998). Regression analysis of count data:
Cambridge University Press.
Census and Statistics Department (2002). Hong Kong 2001 Population Census.
HKSAR: Census and Statistics Department.
Census and Statistics Department (2007). Hong Kong 2006 Population
By-census. HKSAR.
Centre, M. P. (2005). Digital Topographic Map 2004. HKSAR: Survey and
Mapping Office.
238
Chen, C., Lin, H. Y., & Loo, B. P. Y. (2012). Exploring the impacts of safety
culture on immigrants' vulnerability in non-motorized crashes: A
cross-sectional study. Journal of Urban Health-Bulletin of the New York
Academy of Medicine, 89 (1), 138-152.
Cheng, W., & Washington, S. (2008). New criteria for evaluating methods of
identifying hot spots. Transportation Research Record(2083), 76-85.
Cheng, W., & Washington, S. P. (2005). Experimental evaluation of hotspot
identification methods. Accident Analysis and Prevention, 37 (5), 870-881.
Cottrill, C. D., & Thakuriah, P. V. (2010). Evaluating pedestrian crashes in
areas with high low-income or minority populations. Accident Analysis &
Prevention, 42 (6), 1718-1728.
De Blaeij, A., Florax, R. J. G. M., Rietveld, P., & Verhoef, E. (2003). The value
of statistical life in road safety: a meta-analysis. Accident Analysis &
Prevention, 35 (6), 973-986.
Delmelle, E. C., & Thill, J.-C. (2008). Urban bicyclists: Spatial analysis of adult
and youth traffic hazard intensity. Transportation Research Record:
Journal of the Transportation Research Board, 2074 , 31-39.
Department for Transport (2007). Highways Economics Note No. 1
Department for Transport, UK.
Dissanayake, D., Aryaija, J., & Wedagama, D. (2009). Modelling the effects of
land use and temporal factors on child pedestrian casualties. Accident
Analysis & Prevention, 41 (5), 1016-1024.
Eckley, D. C., & Curtin, K. M. (2012). Evaluating the spatiotemporal clustering
of traffic incidents. Computers, Environment and Urban Systems.
Elvik, R. (1997). Evaluations of road accident blackspot treatment: A case of
the iron law of evaluation studies? Accident Analysis and Prevention,
29(2), 191-199.
Elvik, R. (2006). New approach to accident analysis for hazardous road
locations. Safety Data, Analysis, and Evaluation(1953), 50-55.
239
Elvik, R. (2007). State-of-the-art approach to road accident black spot
management and safety analysis of road networks . Oslo: Institute of
Transport Economics.
Elvik, R. (2008). A survey of operational definitions of hazardous road
locations in some European countries. Accident Analysis and Prevention,
40(6), 1830-1835.
Elvik, R. (2012). Speed limits, enforcement, and health consequences. Annual
Review of Public Health, 33 , 225-238.
Elvik, R., Høye, A., Vaa, T., & Sørensen, M. (2009). The handbook of road
safety measures.
Erdogan, S. (2009). Explorative spatial analysis of traffic accident statistics and
road mortality among the provinces of Turkey. Journal of Safety Research,
40(5), 341-351.
Erdogan, S., Yilmaz, I., Baybura, T., & Gullu, M. (2008). Geographical
information systems aided traffic accident analysis system case study: city
of Afyonkarahisar. Accident Analysis and Prevention, 40 (1), 174-181.
Flahaut, B. (2004). Impact of infrastructure and local environment on road
unsafety - Logistic modeling with spatial autocorrelation. Accident
Analysis and Prevention, 36 (6), 1055-1066.
Flahaut, B., Mouchart, M., San Martin, E., & Thomas, I. (2003). The local
spatial autocorrelation and the Kernel method for identifying black zones
- A comparative approach. Accident Analysis and Prevention, 35 (6),
991-1004.
Fotheringham, A., Brubsdon, C., & Charlton, M. (2000). Quantitative
Geography: Perspectives on Spatial Data Analysis . London: Sage
Publication.
Gårder, P. E. (2004). The impact of speed and other variables on pedestrian
safety in Maine. Accident Analysis & Prevention, 36 (4), 533-542.
Geurts, K. (2006). Ranking and Profiling Dangerous Accident Locations Using
Data Mining and Statistical Techniques. Unpublished Doctoral
Dissertation, Hasselt University, Hasselt.
240
Unfallkostenrechnung Straß e 2007
Berücksichtigung des menschlichen Leids (Willingness to Pay) .
GmbH,
H.
C.
(2008).
unter
Graham, D. J., & Glaister, S. (2003). Spatial variation in road pedestrian
casualties: The role of urban scale, density and land-use mix. Urban
Studies, 40(8), 1591-1607.
Graham, D. J., Glaister, S., & Anderson, R. (2005). The effects of area
deprivation on the incidence of child and adult pedestrian casualties in
England. Accident Analysis and Prevention, 37 , 125-135.
Graham, D. J., & Stephens, D. A. (2008). Decomposing the impact of
deprivation on child pedestrian casualties in England. Accident Analysis
& Prevention, 40 (4), 1351-1364.
Green, J., Muir, H., & Maher, M. (2011). Child pedestrian casualties and
deprivation. Accident Analysis & Prevention, 43 (3), 714-723.
Guo, X., & Sheng, Y. (2009). Gong Lu Jiao Tong Shi Gu Hei Dian Wen Xi Ji
Shu. Nanjing: Southeast University Presee.
Hadayeghi, A., Shalaby, A. S., & Persaud, B. N. (2007). Safety prediction
models: Proactive tool for safety evaluation in urban transportation
planning applications. Transportation Research Record: Journal of the
Transportation Research Board, 2019 (-1), 225-236.
Harwood, D. W., Bauer, K. M., Richard, K. R., Gilmore, D. K., Graham, J. L.,
Potts, I. B., et al. (2008). Pedestrian Safety Prediction Methodology .
Hauer, E. (1997). Observational before-after studies in road safety (Vol.
Oxford): Pergamon.
Hauer, E., Harwood, D. W., Council, F. M., & Griffith, M. S. (2002). Estimating
safety by the empirical Bayes method - A tutorial. Statistical Methodology:
Applications to Design, Data Analysis, and Evaluation(1784), 126-131.
Hauer, E., Ng, J. C. N., & Lovell, J. (1988). Estimation of safety at signalized
intersections. Transport Research Record 1185 , 48-61.
241
Higle, J. L., & Hecht, M. B. (1989). A Comparison of Techniques for the
Identification of Hazardous Locations. Transportation Research Record
1238, 10-19.
Hilbe, J. M. (2011). Negative binomial regression: Cambridge University Press.
Hu, G., Wen, M., Baker, T., & Baker, S. (2008). Road-traffic deaths in China,
1985–2005: threat and opportunity. Injury Prevention, 14 (3), 149-153.
Huang, H. L., Abdel-Aty, M. A., & Darwiche, A. L. (2010). County-Level Crash
Risk Analysis in Florida. Transportation Research Record: Journal of the
Transportation Research Board, 2148 (-1), 27-37.
Huang, H. L., Chin, H. C., & Haque, M. M. (2009). Empirical evaluation of
alternative approaches in identifying crash hot spots Naive Ranking,
Empirical Bayes, and Full Bayes methods. Transportation Research
Record(2103), 32-41.
Hummel, T. (2001). Land use planning in safer transportation network
planning: safety principles, planning framework, and library information :
Leidschendam: SWOV Institute for Road Safety Research.
iRAP. (2007). The True Cost of Road Crashes: Valuing Life and the Cost of a
Serious Injury: International Road Assessment Programme.
Jegede, F. (1988). Spatio-temporal analysis of road traffic accidents in Oyo
State, Nigeria. Accident Analysis & Prevention, 20 (3), 227-243.
Jones, A. P., Langford, I. H., & Bentham, G. (1996). The application of
K-function analysis to the geographical distribution of road traffic
accident outcomes in Norfolk, England. Social Science & Medicine, 42 (6),
879-885.
Joshua, S. C., & Garber, N. J. (1990). Estimating truck accident rate and
involvements using linear and Poisson regression models. Transportation
Planning and Technology, 15 (1), 41-58.
Keay, K., & Simmonds, I. (2005). The association of rainfall and other weather
variables with road traffic volume in Melbourne, Australia. Accident
Analysis & Prevention, 37 (1), 109-124.
242
Kim, K., Brunner, I. M., & Yamashita, E. Y. (2006). Influence of land use,
population, employment, and economic activity on accidents.
Transportation Research Record: Journal of the Transportation Research
Board, 1953(-1), 56-64.
Kmet, L., & Macarthur, C. (2006). Urban-rural differences in motor vehicle
crash fatality and hospitalization rates among children and youth.
Accident Analysis and Prevention, 38 (1), 122-127.
Lan, B., & Persaud, B. (2011). Fully Bayesian Approach to Investigate and
Evaluate Ranking Criteria for Black Spot Identification. Transportation
Research Record: Journal of the Transportation Research Board, 2237 (-1),
117-125.
Lands Department, Hong Kong. (2004). Implementation of Data Alignment
Measures for the Alignment of Planning, Lands and Public Works Data.
Ladron de Guevara, F., Washington, S. P., & Oh, J. (2004). Forecasting crashes
at the planning level: simultaneous negative binomial crash model applied
in Tucson, Arizona. Transportation Research Record: Journal of the
Transportation Research Board, 1897 (-1), 191-199.
LaScala, E. A., Gerber, D., & Gruenewald, P. J. (2000). Demographic and
environmental correlates of pedestrian injury collisions: a spatial analysis.
Accident; analysis and prevention, 32 (5), 651.
LaScala, E. A., Gruenewald, P. J., & Johnson, F. W. (2004). An ecological study
of the locations of schools and child pedestrian injury collisions. Accident
Analysis and Prevention, 36 (4), 569-576.
Laughlin, J. C., Hauer, L. E., Hall, J. W., & Clough, D. R. (1975). NCHRP
Report 162: methods for evaluating highway safety improvements.
Washington, D.C.: National Research Counci.
Law, T. H., Noland, R. B., & Evans, A. W. (2009). Factors associated with the
relationship between motorcycle deaths and economic growth. Accident
Analysis & Prevention, 41 (2), 234-240.
Le, H., Geldermalsen, T. v., Lim, W. L., & Murphy, P. (2011). Deriving
accident costs using Willingness-to-Pay Approaches - A case study for
243
Singapore Paper presented at the Australasian Transport Research Forum
2011 Adelaide, Australia
Leden, L. (2002). Pedestrian risk decrease with pedestrian flow. A case study
based on data from signalized intersections in Hamilton, Ontario.
Accident Analysis & Prevention, 34 (4), 457-464.
Lee,
C., & Abdel-Aty, M. (2005). Comprehensive analysis of
vehicle–pedestrian crashes at intersections in Florida. Accident Analysis
& Prevention, 37 (4), 775-786.
Lee, S. C. (1989). Road traffic monitoring in Hong Kong. Proceedings of the
Second International Conference on Road Traffic Monitoring .
Levine, N., Kim, K. E., & Nitz, L. H. (1995). Spatial analysis of Honolulu motor
vehicle crashes: II. Zonal generators Accident Analysis and Prevention,
27(5), 675-685
Li, L., Zhu, L., & Sui, D. Z. (2007). A GIS-based Bayesian approach for
analyzing spatial–temporal patterns of intra-city motor vehicle crashes.
Journal of Transport Geography, 15 (4), 274-285.
Ljung Aust, M., Fagerlind, H., & Sagberg, F. (2011). Fatal intersection crashes
in Norway: Patterns in contributing factors and data collection challenges.
Accident Analysis & Prevention.
Loo, B. P. Y. (2006). Validating crash locations for quantitative spatial analysis:
A GIS-based approach. Accident Analysis and Prevention, 38 (5), 879-886.
Loo, B. P. Y. (2009). The identification of hazardous road locations: A
comparison of the blacksite and hot zone methodologies in Hong Kong.
International Journal of Sustainable Transportation, 3 (3), 187-202.
Loo, B. P. Y., Cheung, W. S., & Yao, S. (2011). The Rural-Urban Divide in
Road Safety: The Case of China. The Open Transportation Journal, 5 ,
9-20.
Loo, B. P. Y., & Tsui, M. K. (2005). Temporal and spatial patterns of
vehicle-pedestrian crashes in busy commercial and shopping areas: A case
study of Hong Kong. Asian Geographer, 24 (1-2), 113-128.
244
Loo, B. P. Y., Wong, S., Hung, W., & Lo, H. K. (2007). A review of the road
safety strategy in Hong Kong. Journal of Advanced Transportation, 41 (1),
3-37.
Loo, B. P. Y., & Yao, S. (2010). The impact of area deprivation on traffic
casualties in Hong Kong. Paper presented at the The 5th HKSTS
International Conference, Hong Kong.
Loo, B. P. Y., & Yao, S. (2012). Geographic information systems. In G. Li & S.
Baker (Eds.), Injury Research: Theories, Methods, and Approaches (pp.
447-463). New York: Springer.
Loo, B. P. Y., Yao, S., & Wu, J. (2011). Spatial point analysis of road crashes in
Shanghai: a GIS-based network kernel density method. Proceedings of the
International Conference on GeoInformatics, 2011 .
Loo, B. P. Y., Yao, S., Wu, J., Yu, B., & Zhong, H. (2011). Identification
method of road hot zone based on GIS. Journal of Traffic and
Transportation Engineering, 11 (4), 97-103.
Lovegrove, G. R., & Sayed, T. (2006). Macro-level collision prediction models
for evaluating neighbourhood traffic safety. Canadian Journal of Civil
Engineering, 33 (5), 609-621
Lyon, C., & Persaud, B. (2002). Pedestrian collision prediction models for
urban intersections. Transportation Research Record: Journal of the
Transportation Research Board, 1818 (-1), 102-107.
Marshall, W. E., & Garrick, N. W. (2011). Does street network design affect
traffic safety? Accident Analysis & Prevention, 43 (3), 769-781.
Miranda-Moreno, L. E., & Fu, L. (2007). Traffic safety study: Empirical Bayes
or Full Bayes? Paper presented at the 86th Annual Meeting of the
Transportation Research Board.
McGuigan, D. (1981). The use of relationships between road accidents and
traffic flow in" black-spot" identification. Traffic Engineering and Control,
22(HS-032 669).
245
McMahon, P. J., Duncan, C., Stewart, J. R., Zegeer, C. V., & Khattak, A. J.
(1999). Analysis of factors contributing to “walking along roadway”
crashes. Journal of the Transportation Research Board, 1999 , 41-48.
Miaou, S. P. (1994). The relationship between truck accidents and geometric
design of road sections: Poisson versus negative binomial regressions.
Accident Analysis & Prevention, 26 (4), 471-482.
Miaou, S. P., & Lord, D. (2003). Modeling traffic crash flow relationships for
intersections - Dispersion parameter, functional form, and Bayes versus
empirical Bayes methods. Statistical Methods and Modeling and Safety
Data, Analysis, and Evaluation(1840), 31-40.
Miller, T. R. (2000). Variations between countries in values of statistical life.
Journal of Transport Economics and Policy, 34 , 169-188.
Ministry of Transport. (2010). The Social Cost of Road Crashes and Injuries
June 2010 update: Ministry of Transport.
Montella, A. (2010). A comparative analysis of hotspot identification methods.
Accident Analysis and Prevention, 42 (2), 571-581.
Moons, E., Brijs, T., & Wets, G. (2009a). Identifying hazardous road locations:
Hot spots versus hot zones. In M. L. Gavrilova & C. J. K. Tan (Eds.),
Transactions on Computational Science Vi (pp. 288-300). Berlin,
Heidelberg: Springer-Verlag Berlin Heidelberg.
Moons, E., Brijs, T., & Wets, G. (2009b). Improving Moran's Index to identify
hot spots in traffic safety In B. Murgante, G. Borruso & A. Lapucci (Eds.),
Geocomputation and Urban Planning (pp. 117–132). Heidelberg: Springer.
Moran, P. (1948). The interpretation of statistical maps. Journal of the Royal
Statistical Society, 10b, 243-251.
Mueller, B. A., Rivara, F. P., & Bergman, A. B. (1988). Urban-rural location
and the risk of dying in a pedestrian-vehicle collision. The Journal of
Trauma, 28(1), 91-94.
246
O'Sullivan, D., & Unwin, D. J. (2003). Geographic Information Analysis.
Hoboken,New Jersey: John Wiley & Sons, Inc.
Okabe, A., Okunuki, K., & Shiode, S. (2006a). The SANET Toolbox: New
method for network spatial analysis. Transactions in GIS, 10 (4), 535-550.
Okabe, A., Okunuki, K., & Shiode, S. (2006b). SANET: A toolbox for spatial
analysis on a network. Geographical Analysis, 38(1), 57-66.
Okabe, A., Satoh, T., & Sugihara, K. (2009). A kernel density estimation
method for networks, its computational method and a GIS-based tool.
International Journal of Geographical Information Science, 23 (1), 7-32.
Okabe, A., & Yamada, I. (2001). The K-function method on a network and its
computational implementation. Geographical Analysis, 33 (3), 271-290.
Okabe, A., Yomono, H., & Kitamura, M. (1995). Statistical analysis of the
distribution of points on a Network. Geographical Analysis, 27(2),
152-175.
Openshaw, S., Charlton, M., Wymer, C., & Craft, A. (1987). Developing a mark
1 geographical analysis machine for the automated analysis of point data
sets. International Journal of Geographical Information Systems, 1 ,
335-358.
Payne, G., Payne, J., & Hyde, M. (1996). 'Refuse of All Classes'? Social
Indicators and Social Deprivation. Sociological Research Online, 1 .
Pei, X., Wong, S., & Sze, N. (2011). A joint-probability approach to crash
prediction models. Accident Analysis & Prevention, 43 (3), 1160-1166.
Pei, X., Wong, S., & Sze, N. (2012). The roles of exposure and speed in road
safety analysis. Accident Analysis & Prevention.
Peden, M., Scurfield, R., Sleet, D., Mohan, D., Hyder, A. A., Jarawan, E., et al.
(2004). World report on road traffic injury prevention . Geneva:
World Health Organization (WHO).
Peeters, D., & Thomas, I. (2009). Network Autocorrelation. Geographical
Analysis, 41(4), 436-443.
247
Persaud, B. (1991). Estimating accident potential of Ontario road sections.
transport Research Record 1327 , 47-54.
Persaud, B., Lan, B., Lyon, C., & Bhim, R. (2010). Comparison of empirical
Bayes and full Bayes approaches for before–after road safety evaluations.
Accident Analysis & Prevention, 42 (1), 38-43.
Persaud, B., Lyon, C., & Nguyen, T. (1999). Empirical Bayes procedure for
ranking sites for safety investigation by potential for safety improvement.
Transportation Research Record, 1665 .
Planning Department (2001). 2001 Tertiary Planning Unit and Street
Block/Village Cluster(TPU&SB/VC) Boundaries. HKSAR.
Planning Department (2006). 2006 Tertiary Planning Unit and Street
Block/Village Cluster (TPU&SB/VC) Boundaries. HKSAR.
Planning Department (2007). Land Utilization in Hong Kong 2006. HKSAR.
Pulugurtha, S. S., & Sambhara, V. R. (2011). Pedestrian crash estimation
models for signalized intersections. Accident Analysis & Prevention, 43 (1),
439-446.
Plug, C., Xia, J. C., & Caulfield, C. (2011). Spatial and temporal visualisation
techniques for crash analysis. Accident Analysis & Prevention, 43 (6),
1937-1946.
Pulugurtha, S. S., Krishnakumar, V. K., & Nambisan, S. S. (2007). New
methods to identify and rank high pedestrian crash zones: An illustration.
Accident Analysis and Prevention, 39 (4).
Qin, X., Ivan, J. N., & Ravishanker, N. (2004). Selecting exposure measures in
crash rate prediction for two-lane highway segments. Accident Analysis
and Prevention, 36 (2), 183-191.
Qin, X., Ivan, J. N., Ravishanker, N., Liu, J., & Tepas, D. (2006). Bayesian
estimation of hourly exposure functions by crash type and time of day.
Accident Analysis & Prevention, 38 (6), 1071-1080.
Quddus, M. A. (2008). Modelling area-wide count outcomes with spatial
correlation and heterogeneity: An analysis of London crash data. Accident
248
Analysis & Prevention, 40 (4), 1486-1497.
Rasouli, M. R., Nouri, M., Zarei, M. R., Saadat, S., & Rahimi-Movaghar, V.
(2008). Comparison of road traffic fatalities and injuries in Iran with other
countries. Chinese Journal of Traumatology (English Edition), 11 (3),
131-134.
Ripley, B. D. (1987).Stochastic Simulation .Wiley & Sons.
Saccomanno, F. F., & Buyco, C. (1988). Generalized loglinear models of truck
accident rates.
Sawilowsky, Shlomo S.; Fahoome, Gail C. (2003). Statistics via Monte Carlo
Simulation with Fortran. Rochester Hills, MI: JMASM.
Schneider, R. J., Ryznar, R. M., & Khattak, A. J. (2004). An accident waiting to
happen: a spatial approach to proactive pedestrian planning. Accident
Analysis and Prevention, 36 (2), 193-211.
Shiode, S. (2011). Street‐level Spatial Scan Statistic and STAC for Analysing
Street Crime Concentrations. Transactions in GIS, 15 (3), 365-383.
Song, J. J., Ghosh, A., Miaou, S., & Mallick, B. (2006). Bayesian multivariate
spatial models for roadway traffic crash mapping. Journal of Multivariate
Analysis, 97(1), 246-273.
Spoerri, A., Egger, M., & von Elm, E. (2011). Mortality from road traffic
accidents in Switzerland: Longitudinal and spatial analyses. Accident
Analysis & Prevention, 43 (1), 40-48.
Steenberghen, T., Aerts, K., & Thomas, I. (2010). Spatial clustering of events on
a network. Journal of Transport Geography, 18 (3), 411-418.
Steenberghen, T., Dufays, T., Thomas, I., & Flahaut, B. (2004). Intra-urban
location and clustering of road accidents using GIS: a Belgian example.
International Journal of Geographical Information Science, 18 (2),
169-181.
Szabat, J., & Knapp, L. (2009). Treatment of the Economic Value of a Statistical
Life in Departmental Analyses – 2009 Annual Revision. Washington, D.C.:
249
Office of the Secretary of Transportation.
Transport Department (2001-2010). Road Traffic Accident Statistics. HKSAR:
Transport Department.
Transport Department (2002-2007). Hong Kong Annual Traffic Census.
HKSAR: Transport Department.
Tsui, M. K. (2006). Pedestrian Crashes in Commercial and Business Areas: A
Case Study of Hong Kong. Unpublished MPhil Thesis.
Tsui, K., So, F., Sze, N., Wong, S., & Leung, T. (2009). Misclassification of
injury severity among road casualties in police reports. Accident Analysis
& Prevention, 41 (1), 84-89.
Tunaru, R. (1999). Hierarchical Bayesian models for road accident data. Traffic
engineering & control, 40 (6), 318-324.
Van den Bossche, F., Wets, G., & Brijs, T. (2005). Role of exposure in analysis
of road accidents: a Belgian case study. Transportation Research Record:
Journal of the Transportation Research Board, 1908 (-1), 96-103.
Waller, L., & Gotway, C. (2004). Applied spatial statistics for public health data.
New York: Wiley.
Wang, C., Quddus, M. A., & Ison, S. G. (2011). Predicting accident frequency
at their severity levels and its application in site ranking using a two-stage
mixed multivariate model. Accident Analysis & Prevention, 43 (6),
1979-1990.
Wier, M., Weintraub, J., Humphreys, E. H., Seto, E., & Bhatia, R. (2009). An
area-level model of vehicle-pedestrian injury collisions with implications
for land use and transportation planning. Accident Analysis & Prevention,
41(1), 137-145.
Wong, S. C., Sze, N. N., & Li, Y. C. (2007). Contributory factors to traffic
crashes at signalized intersections in Hong Kong. Accident Analysis and
Prevention, 39 (6), 1107-1113.
World Health Organization (2004). World Health Day: Road safety is no
250
accident! , from
http://www.who.int/mediacentre/news/releases/2004/pr24/en/index.html
World Health Organization (2008a). The Global Burden of Disease: 2004
update.
World Health Organization (2008b). World health statistics 2008 .
World Health Organization (2009). Global Status Report on Road Safety .
Xie, Z. X., & Yan, J. (2008). Kernel Density Estimation of traffic accidents in a
network space. Computers Environment and Urban Systems, 32 (5),
396-406.
Yang, J., & Otte, D. (2007). A comparison study on vehicle traffic accident and
injuries of vulnerable road users in China and Germany. Paper presented
at the Proceedings 20th International Technical conference on the
Enhanced Safety of Vehicles. Lyon, France. Paper.
Yamada, I., & Thill, J.-C. (2004). Comparison of planar and network
K-functions in traffic accident analysis. Journal of Transport Geography,
12, 149-158.
Yamada, I., & Thill, J. C. (2007). Local indicators of network-constrained
clusters in spatial point patterns. Geographical Analysis, 39 (3), 268-292.
Yamada, I., & Thill, J. C. (2010). Local indicators of network-constrained
clusters in spatial patterns represented by a link attribute. Annals of the
Association of American Geographers, 100 (2), 269-285.
251
Download