Morio1 There is one age-old question that has plagued us all since the playground era, is one gender really superior to the other? We have all sang the “girls go to college to get more knowledge, and boys go to Jupiter to get more stupider” song or some variation of it, and this eventually carries on into our older years where certain stigma’s are attached to our genders. One of the most common of these for women is that it is widely assumed we are poor drivers. As someone who has been brought up to never let another’s assumptions define him or her, I find this offensive. While I know quite a few terrible drivers, not all are women, in fact I know just as many male drivers that I might not get into a car with. I also find this of interest as per my graduation at St. John Fisher I will head to law school and eventually become an attorney. I am not sure what sort of law I would like to practice, but if I get into any type pertaining to auto accidents or related injuries it might be interesting to see if my client base may be more skewed towards one gender. I am hoping to use the dataset in Table 1 to see if women really are worse drivers than men, using the amount of accidents to define “worse”. The data was collected from 100 students at Hope College, which is located in Holland, Michigan, who were asked to report how many accidents they had been in throughout their last 10 years behind the wheel (VanderLaan, Ratliff, and Bredow). These responses were subsequently divided into proper categories to compare male and female students. After analyzing the descriptive statistics of the data, I want to use the data to test the hypothesis that men are really worse drivers than women. In this case we will use the mean amount of accidents per population of men and women to define “bad driving”. When analyzing different types of data it is important to look at the descriptive statistics such as the mean, mode, median, variance, and standard deviation. The values Morio2 for mean, median and mode, as well as the range for the data set of women can be found in Table 2; standard deviation, and variance in Table 3. For men, the values of mean, median, and mode, along with the range can be found in Table 4; standard deviation and variance in Table 5. For both men and women the value for mode is 0, if we recall that the responses given for the set were for the number of accidents for the last 10 years, this does not seem odd since many students have not been driving for 10 years and therefore have not had as much time to get in as many accidents. The mean for women as given in Table 2 is .66, and the median is .5. Both of these values can be influenced by outliers. Outliers can be calculated by the formulas: Upper Fence = Q3+1.5*IQR, and Lower Fence= Q1-1.5*IQR using the 5 number summary for either data set listed in Tables 6 and 7, in this case we are just looking at Table 6 for female related data. Applying these formulas to this data set as found in Table 8, finds that the value of 3 from our data set in Table 1 is outlying, as the upper fence value is 2.5. Table 3 lists variance and standard variation values which are .596 and .77 respectively. These are two important statistics for each data set: variance measures how different each individual value given in the set is; standard deviation measures variability of the entire set of data, the larger the value for the standard deviation is the more spread out the data is. Outliers can also influence the standard deviation of a data set, which we have already found to exist in the data table for women. As the variance is .596 which is a low value, the difference of each value in the set is not that high. The standard deviation being .77 shows a relatively low variability and a small spread of data that is closer together. Finally, it is important to note unusual values in the data. Usual values, which would be considered standard for the set, lie within two standard deviations of the mean. Any found value outside of this range, would Morio3 be considered an unusual value. To find this we use the formulas: minimum usual accepted value = (mean)-2*σ, and maximum usual accepted value= (mean) + 2*σ. Using the values found in Table 10 we find that the range of accepted usual values are between -.88 and 2.2. Referencing Table 1 will show that there is one entry of three accidents that a student got into, and thinking rationally three accidents within ten years is a high amount to get into, in other words it is unusual. Moving on to analyze the men we already see that, as mentioned previously, they share the same value of mode with women holding a value of 0. The mean for their set is 1.04 and median is 1 as can be seen in Table 4, it seems that the men have significantly higher values in this area than the women do. As we have already discussed outliers can critically influence the mean and median, and as can be seen in Table 1, two men replied that they had been in 4 accidents. Using the formulas: Upper Fence = Q3+1.5*IQR, and Lower Fence= Q1-1.5*IQR and the information for the five number summary in Table 7, we can find out our range for outlying values. Table 9 applies those formulas, and the outcome represents the values as the Upper Fence= 5, and the Lower Fence= -1. Unlike the data set for women, this data set contains no outlying values. On the other hand, we might suspect that while there might not be outlying values in this set, there might be unusual values instead. Again, to calculate what an unusual value would be, we use the formulas: minimum usual accepted value = (mean)-2*σ, and maximum usual accepted value= (mean) + 2*σ. If we look at Table 11, we see that when these formulas are used with data from the men, the range of values is -1.2 – 3.28. Anything outside of this range is considered unusual, and we can recollect that there were two men stating they had been Morio4 in four accidents. Once again using common sense we could reasonably say this is an unusual amount of accidents to be in within a 10 year time frame. Table 5 references the variance and standard deviation for men, which is 1.26 and 1.12. The data for men had a higher average than that of women, which affected the value for variation and standard deviation, resulting in the numbers being significantly greater than the variation and standard deviation for the women. Using our prior knowledge of the definitions for variation and standard deviation, we could subsequently say that there is more of a discrepancy between the individual values of data in the set than there was for women. We could also say that the data is a larger spread that is farther apart with more variability than that of the women. For more of a visual representation of the median, variance, and standard deviation for women refer to Graph 1, and refer to Graph 2 for the same information regarding men. To visually confirm that these statistics are in fact higher in men and compare the two sets side by side, refer to Graph 3. After analyzing all of the above statistics, now we would like to test a specific hypothesis concerning the data set. If you recall from earlier, I want to see if men get into more accidents than women do, as we are using amount of accidents as a measurement of how “bad” a driver is. So, I want to test the claim that the mean of women’s accidents from the sample are less than that of men’s. In my hypothesis test women will be considered as Population1 (P1) and men will be considered as Population 2 (P2). To find our critical value we must use the t distribution: for critical t values table and degrees of freedom (DOF) (n-1). Both of our populations contain 50 responses, so 49 is our DOF and we are testing with a .01 significance level. As Table 12 shows, our critical value is 2.412. Our null and alternative hypothesis are the following: H0: µ1= µ2 and Ha: µ1<µ2 Morio5 respectively. To find our test statistic we use the formula t= (x̄1 - x̄2) – (µ1-µ2)/ √s12/n1 + s22/n2 , the value for this formula which is -1.98 and computation can be found in Table 13. While it may not be a spot on accurate curve, Figure 1 can provide us with a visual to see where all of these numbers go and to help us reject or fail to reject our claim. For us to be able to reject our claim, our test statistic would need to be to the right of, or greater than, 2.412. As the test statistic is less than our critical value, we must fail to reject our claim that the mean of women’s accidents is less than that of men’s, there is not enough evidence to claim that men get into more accidents. Finally, after constructing a 99% confidence interval, whose formula and computation can be referenced in Table 14, the interval found is: -.9<µ1-µ2<.14. Expanding on this, 99% of the true difference is between -.9 – .14, signifying that there is no significant difference between the amount of accidents men and women get into. Despite our findings, as with any data set, there are possible limitations to the data that are sometimes out of control of who is dealing with the data itself. As for our data listed here, it was collected in 2003, so driving habits could have changed in the last 10 years (VanderLaan, Ratliff, and Bredow). The data was also collected by other students who I have never met, from a college I personally have never been to. As far as I know they could have made up the numbers, or lied about some of the responses they got, which could skew the information so that men got into more accidents than women, or vice versa. Finally the biggest limitation is that we do not know the ages of the students surveyed, but a reasonable assumption would be that a majority of people attending college have not had their licenses for 10 years, even though people were asked how many accidents they had been in during the last 10 years (VanderLaan, Ratliff, and Morio6 Bredow). This could make the some of the statistics look higher than they really should be, as someone could have gotten in an accident three times in two years, instead of the 10 years they were asking for. Unfortunately, we had to fail to reject the claim that women get into less accidents than men do. This was sort of upsetting, but the bright side of this is that our confidence interval -.9<µ1-µ2<.14 showed that there is not a significant difference in the amount of accidents men and women get into. As far as I am concerned, this should help to ease the stigma that women are worse drivers than men; even though our hypothesis test showed us unfavorable results, our confidence interval showed that there is not a big enough difference in the amount of accidents between men and women to be considered noteworthy. No one may be able to totally erase a stigma that is attached to certain genders, religions, or ethnicities, but that should make us want to strive to better ourselves and prove these wrong. At the very least I know that according to this data set if I choose to practice law related to auto injuries, I can expect to have a client basis that is almost equally men and women. Morio7 Appendix of Tables/Graphs Table 1 Female No. of Frequency Accidents 0 1 2 3 25 18 6 1 Table 2 (Women) Male No. of Frequency Accidents 0 1 2 3 4 20 16 8 4 2 Table 3(Women) .66 Mean Median .5 Mode 0 Range 3 Table 4(Men) Mean 1.04 Median 1 Mode 0 Range 4 Variance .596 Standard Deviation .77 Table 5 (Men) Variance 1.26 Standard 1.12 Deviation Morio8 Table 6 (Women) Table 7 (Men) Five Number Summary Five Number Summary Min 0 Max 0 Q1 0 Q1 0 Q2(Median) .5 Q2(Median) 1 Q3 1 Q3 2 Max 3 Max 4 Interquartile Interquartile 1 Range (IQR) Range (IQR) (Q3-Q1=IQR) (Q3-Q1=IQR) Table 8 (Women) Formula and Value For Outlying Data (Upper/Lower) Upper Fence = Q3+1.5*IQR 2.5 Upper Fence= 1+1.5*1 Lower Fence= Q1-1.5*IQR Lower Fence= 1-1.5*1 -.5 2 Morio9 Table 9 (Men) Table 10 (Women) Formula and Value For Outlying Data minimum usual accepted value = (mean)-2*σ (Upper/Lower) Upper Fence = Q3+1.5*IQR 5 maximum usual accepted value= (mean) + 2*σ -1 Lower Fence= 2-1.5*2 maximum usual accepted value= (.66)+2(.77) Table 11 (Men_) Usual Value Formulas/Accepted Range minimum usual accepted value = (mean)-2*σ -1.2 minimum usual accepted value= (1.04)-2(1.12) maximum usual accepted value= (mean) + 2*σ maximum usual accepted value= (1.04)+2(1.12) -.88 minimum usual accepted value= (.66)-2(.77) Upper Fence= 2+1.5*2 Lower Fence= Q1-1.5*IQR Usual Value Formulas/ Accepted Range 3.28 2.2 Morio10 Table 12 Degrees of Freedom (DOF) 49 (N-1) α = .01 2.412 tαdof= t,.01, 49 Table 13 Test Statistic Formula and Actual Statistic t= (x̄1 - x̄2) – (µ1-µ2)/ √s12/n1 + s22/n2 -1.98 t= (.66-1.04) – (0)/ √ .772/50 + 1.122/50 Figure 1 -1.98 2.412 Table 14 Confidence Interval Formula and Values E=tα/2dof*√s12/n1+s22/n2 .517054108 E= 2.690*√.772/50+1.122/50 (x̄1- x̄2)-E<µ2 - µ1< (x̄1 + x̄2) + E (.66-1.04)-.517054108< µ2 µ1<(.66+1.04)+.517054108 -.9< µ2 - µ1<.14 Morio11 Graph 1 Graph 2 Morio12 Graph 3 Morio13 Works Cited VanderLaan, Tim, Pat Ratliff, and Andrew Bredow. (2003). The number of accidents Hope College students have been in during the last ten years they were drivers [Survey]. Retrieved from http://www.math.hope.edu/swanson/data/accidents.txt