Morio1 There is one age-old question that has plagued us all since

advertisement
Morio1
There is one age-old question that has plagued us all since the playground era, is
one gender really superior to the other? We have all sang the “girls go to college to get
more knowledge, and boys go to Jupiter to get more stupider” song or some variation of
it, and this eventually carries on into our older years where certain stigma’s are attached
to our genders. One of the most common of these for women is that it is widely assumed
we are poor drivers. As someone who has been brought up to never let another’s
assumptions define him or her, I find this offensive. While I know quite a few terrible
drivers, not all are women, in fact I know just as many male drivers that I might not get
into a car with. I also find this of interest as per my graduation at St. John Fisher I will
head to law school and eventually become an attorney. I am not sure what sort of law I
would like to practice, but if I get into any type pertaining to auto accidents or related
injuries it might be interesting to see if my client base may be more skewed towards one
gender. I am hoping to use the dataset in Table 1 to see if women really are worse drivers
than men, using the amount of accidents to define “worse”. The data was collected from
100 students at Hope College, which is located in Holland, Michigan, who were asked to
report how many accidents they had been in throughout their last 10 years behind the
wheel (VanderLaan, Ratliff, and Bredow). These responses were subsequently divided
into proper categories to compare male and female students. After analyzing the
descriptive statistics of the data, I want to use the data to test the hypothesis that men are
really worse drivers than women. In this case we will use the mean amount of accidents
per population of men and women to define “bad driving”.
When analyzing different types of data it is important to look at the descriptive
statistics such as the mean, mode, median, variance, and standard deviation. The values
Morio2
for mean, median and mode, as well as the range for the data set of women can be found
in Table 2; standard deviation, and variance in Table 3. For men, the values of mean,
median, and mode, along with the range can be found in Table 4; standard deviation and
variance in Table 5. For both men and women the value for mode is 0, if we recall that
the responses given for the set were for the number of accidents for the last 10 years, this
does not seem odd since many students have not been driving for 10 years and therefore
have not had as much time to get in as many accidents. The mean for women as given in
Table 2 is .66, and the median is .5. Both of these values can be influenced by outliers.
Outliers can be calculated by the formulas: Upper Fence = Q3+1.5*IQR, and Lower
Fence= Q1-1.5*IQR using the 5 number summary for either data set listed in Tables 6
and 7, in this case we are just looking at Table 6 for female related data. Applying these
formulas to this data set as found in Table 8, finds that the value of 3 from our data set in
Table 1 is outlying, as the upper fence value is 2.5. Table 3 lists variance and standard
variation values which are .596 and .77 respectively. These are two important statistics
for each data set: variance measures how different each individual value given in the set
is; standard deviation measures variability of the entire set of data, the larger the value for
the standard deviation is the more spread out the data is. Outliers can also influence the
standard deviation of a data set, which we have already found to exist in the data table for
women. As the variance is .596 which is a low value, the difference of each value in the
set is not that high. The standard deviation being .77 shows a relatively low variability
and a small spread of data that is closer together. Finally, it is important to note unusual
values in the data. Usual values, which would be considered standard for the set, lie
within two standard deviations of the mean. Any found value outside of this range, would
Morio3
be considered an unusual value. To find this we use the formulas: minimum usual
accepted value = (mean)-2*σ, and maximum usual accepted value= (mean) + 2*σ. Using
the values found in Table 10 we find that the range of accepted usual values are between
-.88 and 2.2. Referencing Table 1 will show that there is one entry of three accidents that
a student got into, and thinking rationally three accidents within ten years is a high
amount to get into, in other words it is unusual.
Moving on to analyze the men we already see that, as mentioned previously, they
share the same value of mode with women holding a value of 0. The mean for their set is
1.04 and median is 1 as can be seen in Table 4, it seems that the men have significantly
higher values in this area than the women do. As we have already discussed outliers can
critically influence the mean and median, and as can be seen in Table 1, two men replied
that they had been in 4 accidents. Using the formulas: Upper Fence = Q3+1.5*IQR, and
Lower Fence= Q1-1.5*IQR and the information for the five number summary in Table 7,
we can find out our range for outlying values. Table 9 applies those formulas, and the
outcome represents the values as the Upper Fence= 5, and the Lower Fence= -1. Unlike
the data set for women, this data set contains no outlying values. On the other hand, we
might suspect that while there might not be outlying values in this set, there might be
unusual values instead. Again, to calculate what an unusual value would be, we use the
formulas: minimum usual accepted value = (mean)-2*σ, and maximum usual accepted
value= (mean) + 2*σ. If we look at Table 11, we see that when these formulas are used
with data from the men, the range of values is -1.2 – 3.28. Anything outside of this range
is considered unusual, and we can recollect that there were two men stating they had been
Morio4
in four accidents. Once again using common sense we could reasonably say this is an
unusual amount of accidents to be in within a 10 year time frame. Table 5 references the
variance and standard deviation for men, which is 1.26 and 1.12. The data for men had a
higher average than that of women, which affected the value for variation and standard
deviation, resulting in the numbers being significantly greater than the variation and
standard deviation for the women. Using our prior knowledge of the definitions for
variation and standard deviation, we could subsequently say that there is more of a
discrepancy between the individual values of data in the set than there was for women.
We could also say that the data is a larger spread that is farther apart with more
variability than that of the women. For more of a visual representation of the median,
variance, and standard deviation for women refer to Graph 1, and refer to Graph 2 for
the same information regarding men. To visually confirm that these statistics are in fact
higher in men and compare the two sets side by side, refer to Graph 3.
After analyzing all of the above statistics, now we would like to test a specific
hypothesis concerning the data set. If you recall from earlier, I want to see if men get into
more accidents than women do, as we are using amount of accidents as a measurement of
how “bad” a driver is. So, I want to test the claim that the mean of women’s accidents
from the sample are less than that of men’s. In my hypothesis test women will be
considered as Population1 (P1) and men will be considered as Population 2 (P2). To
find our critical value we must use the t distribution: for critical t values table and degrees
of freedom (DOF) (n-1). Both of our populations contain 50 responses, so 49 is our DOF
and we are testing with a .01 significance level. As Table 12 shows, our critical value is
2.412. Our null and alternative hypothesis are the following: H0: µ1= µ2 and Ha: µ1<µ2
Morio5
respectively. To find our test statistic we use the formula t= (x̄1 - x̄2) – (µ1-µ2)/ √s12/n1 +
s22/n2 , the value for this formula which is -1.98 and computation can be found in Table
13. While it may not be a spot on accurate curve, Figure 1 can provide us with a visual to
see where all of these numbers go and to help us reject or fail to reject our claim. For us
to be able to reject our claim, our test statistic would need to be to the right of, or greater
than, 2.412. As the test statistic is less than our critical value, we must fail to reject our
claim that the mean of women’s accidents is less than that of men’s, there is not enough
evidence to claim that men get into more accidents. Finally, after constructing a 99%
confidence interval, whose formula and computation can be referenced in Table 14, the
interval found is: -.9<µ1-µ2<.14. Expanding on this, 99% of the true difference is between
-.9 – .14, signifying that there is no significant difference between the amount of
accidents men and women get into.
Despite our findings, as with any data set, there are possible limitations to the data
that are sometimes out of control of who is dealing with the data itself. As for our data
listed here, it was collected in 2003, so driving habits could have changed in the last 10
years (VanderLaan, Ratliff, and Bredow). The data was also collected by other students
who I have never met, from a college I personally have never been to. As far as I know
they could have made up the numbers, or lied about some of the responses they got,
which could skew the information so that men got into more accidents than women, or
vice versa. Finally the biggest limitation is that we do not know the ages of the students
surveyed, but a reasonable assumption would be that a majority of people attending
college have not had their licenses for 10 years, even though people were asked how
many accidents they had been in during the last 10 years (VanderLaan, Ratliff, and
Morio6
Bredow). This could make the some of the statistics look higher than they really should
be, as someone could have gotten in an accident three times in two years, instead of the
10 years they were asking for.
Unfortunately, we had to fail to reject the claim that women get into less accidents
than men do. This was sort of upsetting, but the bright side of this is that our confidence
interval -.9<µ1-µ2<.14 showed that there is not a significant difference in the amount of
accidents men and women get into. As far as I am concerned, this should help to ease the
stigma that women are worse drivers than men; even though our hypothesis test showed
us unfavorable results, our confidence interval showed that there is not a big enough
difference in the amount of accidents between men and women to be considered
noteworthy. No one may be able to totally erase a stigma that is attached to certain
genders, religions, or ethnicities, but that should make us want to strive to better
ourselves and prove these wrong. At the very least I know that according to this data set
if I choose to practice law related to auto injuries, I can expect to have a client basis that
is almost equally men and women.
Morio7
Appendix of Tables/Graphs
Table 1
Female
No. of
Frequency
Accidents
0
1
2
3
25
18
6
1
Table 2 (Women)
Male
No. of
Frequency
Accidents
0
1
2
3
4
20
16
8
4
2
Table 3(Women)
.66
Mean
Median
.5
Mode
0
Range
3
Table 4(Men)
Mean
1.04
Median
1
Mode
0
Range
4
Variance
.596
Standard
Deviation
.77
Table 5 (Men)
Variance
1.26
Standard
1.12
Deviation
Morio8
Table 6 (Women)
Table 7 (Men)
Five Number Summary
Five Number Summary
Min
0
Max
0
Q1
0
Q1
0
Q2(Median)
.5
Q2(Median)
1
Q3
1
Q3
2
Max
3
Max
4
Interquartile
Interquartile
1
Range (IQR)
Range (IQR)
(Q3-Q1=IQR)
(Q3-Q1=IQR)
Table 8 (Women)
Formula and Value For
Outlying Data
(Upper/Lower)
Upper Fence =
Q3+1.5*IQR
2.5
Upper Fence=
1+1.5*1
Lower Fence=
Q1-1.5*IQR
Lower Fence=
1-1.5*1
-.5
2
Morio9
Table 9 (Men)
Table 10 (Women)
Formula and Value For
Outlying Data
minimum usual
accepted value =
(mean)-2*σ
(Upper/Lower)
Upper Fence =
Q3+1.5*IQR
5
maximum usual
accepted value=
(mean) + 2*σ
-1
Lower Fence=
2-1.5*2
maximum usual
accepted value=
(.66)+2(.77)
Table 11 (Men_)
Usual Value
Formulas/Accepted Range
minimum usual
accepted value =
(mean)-2*σ
-1.2
minimum usual
accepted value=
(1.04)-2(1.12)
maximum usual
accepted value=
(mean) + 2*σ
maximum usual
accepted value=
(1.04)+2(1.12)
-.88
minimum usual
accepted value=
(.66)-2(.77)
Upper Fence=
2+1.5*2
Lower Fence=
Q1-1.5*IQR
Usual Value Formulas/
Accepted Range
3.28
2.2
Morio10
Table 12
Degrees of
Freedom (DOF)
49
(N-1)
α = .01
2.412
tαdof= t,.01, 49
Table 13
Test Statistic Formula and Actual Statistic
t= (x̄1 - x̄2) – (µ1-µ2)/ √s12/n1 + s22/n2
-1.98
t= (.66-1.04) – (0)/ √ .772/50 + 1.122/50
Figure 1
-1.98
2.412
Table 14
Confidence Interval Formula and Values
E=tα/2dof*√s12/n1+s22/n2
.517054108
E= 2.690*√.772/50+1.122/50
(x̄1- x̄2)-E<µ2 - µ1< (x̄1 + x̄2) + E
(.66-1.04)-.517054108< µ2 µ1<(.66+1.04)+.517054108
-.9< µ2 - µ1<.14
Morio11
Graph 1
Graph 2
Morio12
Graph 3
Morio13
Works Cited
VanderLaan, Tim, Pat Ratliff, and Andrew Bredow. (2003). The number of accidents
Hope College students have been in during the last ten years they were drivers
[Survey]. Retrieved from http://www.math.hope.edu/swanson/data/accidents.txt
Download