Empirical Project

advertisement
Iryna Bedovska
Kristina Klatz
Anna Nikolova
ECO 300: Econometrics
Prof. G. Kalchev
AUBG
04/28/2015
Empirical Project
Is gender of the driver among the main factors that contributes to the
high probability of a car accident?
1
Content
Introduction…………………………………………………………………………………………
Data…………………………………………………………………………………………………
Methodology……………………………………………………………………………………
Empirical Results………………………………………………………………………………….
Conclusion…………………………………………………………………………………………
2
Introduction
Driving a motor vehicle is an extremely popular and useful advantage nowadays. No matter
whether people are using it to go to work, go on a business trip, or just on a vacation, the
automobile transportation seems to be a general necessity for almost the entire world.
Unfortunately, every year thousands of people are dying on the road in a car accidents. This
project encompasses a deep econometrical analysis about the main factors that influence the
probability of getting in a car accident. The
main question that the paper is going to
answer is whether the gender of the driver
matters when it comes to a car accident
caused by the driver. This is a controversial
question that has been discussed by various
studies. For example, an article from the Washington Post web site states that “the men are the
worse drivers than the women.”(Julie Zauzmer, 2014) It also comments that in the USA, men
and woman hold driving license in about the same numbers, according to federal data.
The article also comments that another main contribution for higher probability of a car accident
is the distraction of the driver in every form – reading while driving, smoking, talking on the
phone, or distraction caused by other passengers in the car.
According to other study, “Mayo Road Safety,” Ireland, however, the gender is not an important
factor to the road safety. On the other hand, it points out that speed, whether, and road conditions
are big contributors to the road safety.
3
An opinion on the problem has also Michael Pines who wrote an article in Drivers.com web site
mentioning that top three causes of a car accidents in America are drunk driving, speeding, and
distracted driving.
What and who causes the car accidents is indeed a controversial topic. With this project we are
going to consider and analyze the problem while looking at a specific cross section data set
consisting of several variables explaining the topic in England in 2011.
4
Data
The data on the traffic incidents was published by the United Kingdom Department of
Transport and licensed by the Open Government license (Department of Transport, 2012). The
dataset consists of 3995 observations collected during a year of 2011 in the Great Britain and
contains the important characteristics of each of them. In our dataset, we have included the
variety of factors which can contribute to the traffic-related accident, among which are the
weather conditions, purpose of driving, age of driver and vehicle, type of vehicle, time of the
week, light conditions, and the Breath Alcohol Test results. The data also includes the
information on whether or not the driver was male or female, whether the accident happen in the
rural or urban area or close to junction and whether the accident was caused by driver, passenger,
or pedestrian. The light conditions were scaled from 1 to 7, where 1 is the lightest and 7 the
darkest. The Breath Alcohol Test (BAT) is measured in the micrograms contained in the 100ml
of blood. To measure other factors, except for the age which is self explanatory, we used dummy
variables which contain the information on whether or not the certain factor was present in the
case.
The descriptive statistics of each variable is demonstrated below (Table 1). The mean
of the first three variables indicates that around 60% of the accidents were caused by the driver,
15% caused by the passenger and around 25% by the pedestrian who was involved in a traffic
accident. The variable female indicates that the female driver was involved in around 30% of the
accidents, which is reasonable. According to the British Department of Transport, there is a
bigger percentage of male drivers applying for the driver’s license than the female, which leads
to the conclusion that having more man on the road should correspond with the percentage of
male involved in a car crash. Whether the gender contributes to the probability of car accident
will be explained later in the paper. For both male and female drivers the average age was around
5
30 years, and the oldest driver being 75. Meanwhile, the youngest driver to get in a traffic-related
accident was 6 years old, which is abnormal for the sample and is very likely to contribute to the
Table 1: Descriptive Statistics
Independent
Observations
variables
3995
causedbydr~r
3995
causedbypa~r
3995
causedbype~n
3697
Ageofdrver
3995
Female
3995
Weekend
3995
Rotary
3995
Onewaystreet
3995
Closetojun~n
3995
Light_Cond~s
3995
Fogormist
3995
Rain
3995
Snow
3995
Wetroad
3995
Frostice
3995
Badroadmai~e
3995
BreathAlco~l
3995
Was_Vehicl~e
3995
Apurposeas~k
3995
Bpurposeco~k
3995
Cpurposeta~o
3995
Dpurposepu~s
3995
Fnotknownp~e
2499
Age_of_Veh~e
3995
Car
3995
Motocycle
3995
Goodsvehic~s
3995
busminibus
3995
othertypes~s
accident in that particular case.
Mean
Std. Dev.
min
max
.6010013
0.151189
.2478098
32.77144
.3319149
.2145181
.0235294
.0565707
.8187735
1.926408
.0002503
.0876095
.0005006
.1409262
.0015019
.0180225
32.81377
.0020025
.2225282
.0347935
.0022528
.0002503
.7401752
6.489396
.5041302
.3246558
.089612
.0748436
.0067584
.4897538
0.3582778
.4317948
13.89976
.4709596
.410539
.1515966
.2310494
.385254
1.395174
.0158213
.2827616
.0223719
.3479889
.0387298
.1330494
41.06407
.0447101
.415996
.1832793
.0474163
.0158213
.4385932
4.154852
.5000455
.4683047
.2856609
.2631717
.0819418
0
0
0
6
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
1
1
1
75
1
1
1
1
1
7
1
1
1
1
1
1
945
1
1
1
1
1
1
39
1
1
1
1
1
6
The table also indicates that the bigger portion of accidents happens during the
working dates instead of Sunday and Saturday, and most of them happen in the places close to
junction (around 80%) or in the urban area. Most of crashes happened during the daylight, as we
can see that on average it was light with the standard deviation of one on the scale from 1 to 7.
As for the weather conditions, most of the variables in the sample does not seem to contribute as
much, which we will analyze later in the paper.
The Breath alcohol test is a quite interesting variable here. According to the US
national Medical association, the alcohol level of 500 micrograms contained in the 100ml of
blood is highly abnormal (and dangerous); therefore, the maximum result of 945 micrograms is
not only abnormal but also lethal for the person. However, the average result of 32 micrograms is
reasonable with the standard deviation being 41 micrograms, meaning that the majority of the
sample is falling between 0 and 50 micrograms per 100 ml.
The purpose of driving was unknown in most of the cases, which might make it quite
difficult to estimate whether or not the purpose will actually contribute to the accident, which is
one of the shortcomings of the data. The data on the age of vehicle indicates that the age of most
of the vehicles falls between 2 and 10 years, however the data on the age is missing for the
quarter of the sample (number of observations is 2499 out of 3995). The final rows show that
50% of the vehicles involved in an accident were cars, 30% were motorcycles, and 20% were
classified as other types of vehicles.
7
Empirical Results
We estimated many different specifications, which are appropriate for our dataset, but for the
sake of the research we decided to limit the data to the most valuable ones. Therefore, we ran a
regression on the amount of accidents caused by the driver on all of the specifications including
driver’s age, gender, level of intoxication (if any), weather and light conditions, location and the
purpose of driving. The regression showed the following results:
8
Testing the gender as a factor for the higher possibility of the car accidents
Dependent variable: causedbydriver
Independent variables
(1)
Ageofdrver
Female
Weekend
Rotary
Onewaystreet
Closetojun~n
Light_Cond~s
Fogormist
Rain
Snow
Wetroad
Frostice
Badroadmai~e
BreathAlco~l
Apurposeas~k
Bpurposeco~k
Age_of_Veh~e
Car
Motocycle
Goodsvehic~s
busminibus
othertypes~s
Observations
R-squared
-0.0039177
(0.0006984*)
-0.3445226
(0.0200852*)
-0.006749
(0.23405*)
0.0575422
(0.0629283*)
0.062074
(0.0389585*)
-0.0487564
(0.0241277*)
0.0202728
(0,069643*)
0.6049933
(0.4869581*)
0.0720242
(0.0504925*)
0.3814812
(0.4525281*)
-0.0333851
(0.0417999*)
0.4404122
(0.261792*)
0.0101642
(0.0666238*)
0.0013927
(0.0002318*)
0.407608
(0.1856447)
0.3969768
(0.192715*)
-0.0009889
(0.0022916*)
0.0289897
(0.033572*)
0.008398
(0.379719*)
0
0.0354432
(0.0440995*)
0
2312
0.1486
9
After the regression we conducted the Breusch-Pagan test to test for the
heteroskedasticity, so the regression will not produce any misleading or bias results. The test
proved that the heteroskedasticity was unlikely in our case, so we did not use the robust
command in the test. From the regression we can see that some variables are much more
significant that the others. For instance, the age of driver, light conditions, gender, BAT, location
and the purpose of driving are among the significant ones. As for the weather conditions, none of
them is significant except for the frost and ice, which we decided to include in our next
regressions as well. At this stage, we are not considering the impact of the variables on the
probability of causing the car crash yet. Instead, we are trying to make sure that we have
included all of the significant variables.
The results on the weather conditions seem to be suspicious. Therefore, we are testing the
fog, mist, snow and wet road for the joint significance by using the F-test. Stata proves that we
do not reject the hypothesis that none of the variables is significant, and that we can exclude
these variables from the regression.
As part of our empirical results we estimated three different specifications consisting of the
statistically significant independent variables from the previous regression. The dependent
variable in each specification is causedbydriver – a dummy variable that represents the
probability of getting in a car accidents caused by the driver of the vehicle.
Testing the gender as a factor for the higher possibility of the car accidents
Dependent variable: causedbydriver
Independent variables
(1)
(2)
(3)
Female
BreathAlcoholLevelmicrog100ml
-0.3566255
(0.0154588)*
-
Ageofdriver
-
Closetojunction
-
-0.3618095
(0.0157997)*
0.0009082
(0.0001875)*
-0.0033131
(0.0005542)*
-
-0.3605597
(0.0157842)*
0.0010463
(0.0001925)*
-0.003418
(0.000555)*
-0.05024
10
Light_Conditions
-
-
Apurpuseaspartofwork
-
-
Frostice
-
-
Intercept
0.7193706
(0.0089061)*
3995
0.1176
0.8181745
(0.0195961)*
3697
0.1384
Observations
R-squared
(0.0193117)*
0.01577
(0.0054466)*
-0.0134018
(0.0178304)*
0.2357548
(0.2017408)*
0.830215
(0.0275091)*
3697
0.1406
In the first specification, we regressed causedbydriver on female, which leads to the equation:
causedbydriver = α + β1*female + u. After estimating the results we get the following
expression:
causedbydriver = 0.7193706 – 0.3566255female,
where the female, after tested with a t-test, is a highly significant dummy variable with rejection
region less than 1%.
In order to make sure that the regression is fully reliable we tested it for heteroskedasticity.
According to the Breush-Pegan heteroskedasticity test, the regression is heteroskedastic with
Prob>chi2 = 0.0081. This characteristic of the regression may lead to a misleading results and
that is why we estimated the same specification but using robust standard errors.
WLS Estimation of the causedbydriver Equation
Independent Variables
With Nonrobust
With Robust Standard
Standard Errors
Errors
female
-0.3566255
(0.0154588)
-0.3566255
(0.018143)
11
After being corrected for heteroskedasticity, the coefficient in front of the independent female
variable stays highly significant and robust. The negative sign of the β1 coefficient in front of the
variable implies that the female have around 36% less contribution to the probability of getting
in a car accident than men. This result coincides with the article written by Julie Zauzmer.
In the second specification, we added two more independent variables in order to see what the
results will be if we reduce the probability of bias in the coefficient in front of female.
The equation that we are testing is:
causedbydriver = α + β1*female + β2*BreathAlcoholLevelmicrog100ml + β3*Ageofdriver + u
The estimation that results from the regression is expressed as the following expression:
causedbydriver = 0.8181745 - 0.3618095*female +
0.0009082*BreathAlcoholLevelmicrog100ml – 0.0033131*Ageofdriver
Corrected for heteroskedasticity, all coefficients are highly significant with rejection region less
than 1%. The bias in the estimate in front of female is very small, showing that the alcohol and
the age of the driver have just a little impact on the probability of getting in a car accident. This
can be explained with the fact that if a driver causes a car accident it does not necessary means
that she/he was drunk or too young. But if the driver was drunk or immature, there is a higher
chance for him/her to get in a car accident.
WLS Estimation of the causedbydriver Equation
Independent Variables
With Nonrobust
With Robust Standard
Standard Errors
Errors
female
-0.3618095
(0.0157997)
-0.3618095
(0.0164824)
12
BreathAlcoholLevelmicrog100ml
Ageofdriver
0.0009082
(0.0001875)*
-0.0033131
(0.0005542)
0.0009082
(0.0002519)
-0.0033131
(0.0006065)
In the last specification, we included all the significant variables from the general regression.
That leads to the equation:
causedbydriver = α + β1*female + β2*BreathAlcoholLevelmicrog100ml + β3*Ageofdriver +
β4*Closetojunction + β5*Light_Conditions + β6* Apurpuseaspartofwork + β6*Frostice + u
The estimation observed after regressing the equation is:
causedbydriver = 0.830215 – 0.3605597*female +
0.0010463*BreathAlcoholLevelmicrog100ml - 0.003418 *Ageofdriver 0.05024*Closetojunction + 0.01577*Light_Conditions - 0.0134018 * Apurpuseaspartofwork +
0.2357548*Frostice
The regression was corrected for heteroskedasticity with the robust standard errors for more
precise
results.
WLS Estimation of the causedbydriver Equation
Independent Variables
With Nonrobust
With Robust
Standard Errors
Standard Errors
female
BreathAlcoholLevelmicrog100ml
Ageofdriver
Closetojunction
Light_Conditions
-0.3605597
(0.0157842)
0.0010463
(0.0001925)
-0.003418
(0.000555)
-0.05024
(0.0193117)
0.01577
-0.3605597
(0.0165358)
0.0010463
(0.0002791)
-0.003418
(0.0006115)
-0.05024
(0.0191075)
0.01577
13
Apurpuseaspartofwork
Frostice
(0.0054466)
-0.0134018
(0.0178304)
0.2357548
(0.2017408)
(0.0056518)
-0.0134018
(0.0178519)
0.2357548
(0.2472195)
The t-statistic shows that all the independent variables remain highly significant except the
Apurposeaspartofwork and Frostice which have rejection region of 0.453 and 0.340 respectively.
The result on the last regression is the most accurate since it includes more variables and
decreases the bias in the coefficients from the second specification. The results conveys that even
when decreasing the possibility of correlation of female with the error term, the independent
variable stays highly significant. It implies that being a female has 36% less chance of getting in
a car accident. BreathAlcoholLevelmicrog100ml, Ageofdriver, Closetojunction and
Light_Conditions have a very little impact on the probability of getting in a car accident – 0.1%,
0.3%, 5%, and 1% respectively, but still remaining extremely statistically significant.
We explored further the influence of women on getting in a car accident. We decided to check
whether women are more distracted on the road than men. Therefore, we conducted a new
regression with another dependent variable causedbypassenger, whereas our model is
causedbypassenger = α + β1*female
WLS Estimation of the causedbydriver Equation
Independent Variables
With Nonrobust
With Robust
Standard Errors
Standard Errors
female
(0.0157842)
0.0010463
(0.0001925)
-0.3605597
(0.0165358)
0.0010463
(0.0002791)
14
References

Zauzmer, Julie. "Men Are Worse Drivers, Reading Causes More Crashes than Eating, and 6
Other Facts about D.C. Accidents." Washington Post. July 23, 2014. Accessed April 28,
2015. http://www.washingtonpost.com/blogs/dr-gridlock/wp/2014/07/23/men-are-worsedrivers-reading-causes-more-crashes-than-eating-and-6-other-facts-about-d-c-accidents/.

"Road Safety." Road Safety. Accessed April 28, 2015.
http://www.roadsafetymayo.ie/CausesofAccidents/.

Pines, Michael. "Top 3 Causes of Car Accidents in America." Driverscom RSS. February 19,
2013. Accessed April 28, 2015. http://www.drivers.com/article/1173/.
"Publications." - GOV.UK. January 1, 2011. Accessed April 30, 2015.

https://www.gov.uk/government/publications.

"Breath Alcohol Test: US National Medical Association." U.S National Library of Medicine.
Accessed April 30, 2015. http://www.nlm.nih.gov/medlineplus/ency/article/003632.htm.
15
16
Download