Hypothesis Testing Minitab Lab 3 Exercises Datasets for these examples can be accessed at: http://personal.strath.ac.uk/david.young/SQA Question 1 The Tennis data set contains information on the number of Facebook fans and Twitter followers of 40 randomly selected tennis players. This data was downloaded from http://fanpagelist.com/. (i) Produce a boxplot to compare the distributions of Facebook fans and Twitter followers for these tennis players. (ii) Perform and appropriate hypothesis test to determine if there is evidence of a difference in the numbers of fans engaging with the players through Facebook and Twitter. Question 2 The Actors data set shows the same information on 40 randomly selected actors, combined with the Tennis data to give 80 individual observations with a variable Profession indicating actors or tennis players. (i) Explain the missing values in this data. (ii) Produce a boxplot to visually compare the total numbers of social media followers for actors and tennis players. (iii) Perform a hypothesis test to determine if there is a difference in the total number of social media followers between these two professions. Question 3 The data shown in Table 1 is taken from the work of Katkici et al. [1] and shows results of their survey of defence wounds observed during the post-mortem examination of 195 Turkish victims of all forms of stabbing. Of the 195 victims, 162 were male and 33 female. It is of interest to determine whether there is any evidence of a difference between the behaviour of males and females during the course of a fatal stabbing. Gender Male Female Total Present 57 18 75 Absent 105 15 120 Total 162 33 195 Table 1: Observations of defence wounds taken from 195 Turkish stabbing victims (i) Compute the proportion of males and females with defence wounds observed at post-mortem. (ii) Perform an appropriate hypothesis test to determine if there is any evidence of a difference between the behaviour of males and female victims. (iii) Write a sentence to explain the confidence interval reported with the hypothesis test output in (ii), in the context of this problem. (iv) How does this confidence interval agree with the results of the hypothesis test in (ii). Question 4 The Workforce dataset shows the labour force participation rates for males and females over time for several countries in Europe and North America. The data for this example were taken from http://wdi.worldbank.org/table/2.2 and records the labour force participation rates for males and females over time by country. (i) Perform a hypothesis test to determine if there has been a change in rates of females participating in the labour force across both continents from 2000 to 2013. (ii) Perform a hypothesis test to determine if there is evidence of a difference in female labour force participation rates in 2013 between Europe and North America. (iii) Comment on the significance of the difference seen in (ii). Outline Solutions Question 1 (i) Use Graph > Boxplot > Multiple Y’s to get the graph shown in Figure 1. Figure 1: Boxplot of social media followers It would appear from this graph that the tennis players have more Facebook fans than Twitter followers. (ii) Since this is paired data, i.e. Facebook fans and Twitter followers both refer to the same tennis player, a paired t-test is appropriate. The output from this is shown below: Paired T-Test and CI: Facebook fans, Twitter followers Paired T for Facebook fans - Twitter followers Facebook fans Twitter followers Difference N 36 36 36 Mean 1982102 1070881 911221 StDev 4052620 1758757 3022743 SE Mean 675437 293126 503791 95% CI for mean difference: (-111528, 1933970) T-Test of mean difference = 0 (vs 0): T-Value = 1.81 P-Value = 0.079 Since p = 0.079, do not reject NH and conclude that there is no evidence of a difference in the mean number of Facebook fans and Twitter followers. Question 2 (i) Several of the Twitter followers information is missing. It would seem reasonable to assume this is because these individuals do not have Twitter accounts. (ii) Since the data is stacked in this file, in Minitab use the commands Graph > Boxplot > One Y and select With Groups to produce the boxplot shown in Figure 2. Figure 2: Boxplot of social media followers by profession This plot indicates actors have more total social media followers than tennis players. (iii) Since actors and tennis players are different groups of people, a two-sample t-test is appropriate to compare the mean numbers of social media followers. Since the data is stacked, the simplest approach in Minitab is to use the Both samples are in one column option in the command box. The output is shown below: Two-Sample T-Test and CI: Total, Profession Two-sample T for Total Profession Actor Athlete - Tennis N 40 40 Mean 32018482 2844446 StDev 24307356 5234887 SE Mean 3843330 827708 Difference = (Actor) - (Athlete - Tennis) Estimate for difference: 29174036 95% CI for difference: (21240051, 37108021) T-Test of difference = 0 (vs ): T-Value = 7.42 P-Value = 0.000 DF = 42 Since p < 0.001, reject NH and conclude that there is evidence to suggest that actors have significantly more social media followers than tennis players. Question 3 (i) For males the proportion is 57/162 = 0.352, or 35.3% and for females, 18/33 = 0.545, i.e. 54.5%. (ii) The appropriate test is a Z-test for two proportions. In Minitab this can be done using Stat > Basic Statistics > 2 Proportions then selecting Summarized data from the drop-down menu. ‘Events’ are the number of observations of defence wounds (57 and 18) and ‘trials’ and the numbers of males and females (162 and 33 respectively). This gives the output shown below: Test and CI for Two Proportions Sample 1 2 X 57 18 N 162 33 Sample p 0.351852 0.545455 Difference = p (1) - p (2) Estimate for difference: -0.193603 95% CI for difference: (-0.378722, -0.00848332) Test for difference = 0 (vs 0): Z = -2.05 P-Value = 0.040 Since p = 0.040, reject the NH and conclude that there is evidence that significantly more females than males have defence wounds as a result of fatal stabbings. (iii) It is possible to be 95% sure that in the population, between 0.8% and 37.8% more females than males will exhibit defence wounds as a result of a fatal stabbing. (iv) Since the confidence interval does not contain zero, this is not a plausible value for the difference. This agrees with the hypothesis test which indicated that significantly more females than males would have defence wounds, hence the difference is not zero. Question 4 (i) Since the data is recorded over time, the 2000 rates and 2013 rates refer to the same country so this is paired data. The appropriate hypothesis test is a paired t-test. The Minitab output is shown: Paired T-Test and CI: %female 2000, %female 2013 Paired T for %female 2000 - %female 2013 %female 2000 %female 2013 Difference N 46 46 46 Mean 51.02 53.48 -2.457 StDev 8.85 7.65 3.816 SE Mean 1.31 1.13 0.563 95% CI for mean difference: (-3.590, -1.323) T-Test of mean difference = 0 (vs 0): T-Value = -4.37 P-Value = 0.000 Since p < 0.001, reject NH and conclude that there is evidence of an increase in the rates of females participating in the labour force over time across both Europe and North America. (ii) The comparison between the two different continents is a two-sample t-test since these are independent samples. Since the data is stacked, the appropriate option in Mintiab is Both samples are in one column from the t-test drop down menu. This gives the output shown: Two-Sample T-Test and CI: %female 2013, Continent Two-sample T for %female 2013 Continent Europe North America N 36 10 Mean 52.36 57.50 StDev 7.35 7.71 SE Mean 1.2 2.4 Difference = (Europe) - (North America) Estimate for difference: -5.14 95% CI for difference: (-11.03, 0.75) T-Test of difference = 0 (vs ): T-Value = -1.88 P-Value = 0.082 DF = 13 Although the mean is higher in North America than in Europe, p = 0.082 indicates that it is reasonable to conclude that this difference occurred by chance. There is no evidence at the 5% significance level of the rates of females participating in the labour force in 2013 being different between the two continents. (iii) The difference in between North America and Europe may become significant if data from more countries within each continent was included. References [1] Katkici, Ü., Özkök, M.S. and Örsal, M. (1994) An autopsy evaluation of defence wounds in 195 homicidal deaths due to stabbing. Journal of the Forensic Scitnce Society, 34(4): 237-240.