Hypothesis Testing Minitab Lab 3 Exercises

advertisement
Hypothesis Testing
Minitab Lab 3 Exercises
Datasets for these examples can be accessed at:
http://personal.strath.ac.uk/david.young/SQA
Question 1
The Tennis data set contains information on the number of Facebook fans and Twitter
followers of 40 randomly selected tennis players. This data was downloaded from
http://fanpagelist.com/.
(i) Produce a boxplot to compare the distributions of Facebook fans and Twitter
followers for these tennis players.
(ii) Perform and appropriate hypothesis test to determine if there is evidence of a
difference in the numbers of fans engaging with the players through Facebook
and Twitter.
Question 2
The Actors data set shows the same information on 40 randomly selected actors,
combined with the Tennis data to give 80 individual observations with a variable
Profession indicating actors or tennis players.
(i) Explain the missing values in this data.
(ii) Produce a boxplot to visually compare the total numbers of social media followers
for actors and tennis players.
(iii) Perform a hypothesis test to determine if there is a difference in the total number
of social media followers between these two professions.
Question 3
The data shown in Table 1 is taken from the work of Katkici et al. [1] and shows results
of their survey of defence wounds observed during the post-mortem examination of 195
Turkish victims of all forms of stabbing. Of the 195 victims, 162 were male and 33
female.
It is of interest to determine whether there is any evidence of a difference between the
behaviour of males and females during the course of a fatal stabbing.
Gender
Male
Female
Total
Present
57
18
75
Absent
105
15
120
Total
162
33
195
Table 1: Observations of defence wounds taken from 195 Turkish stabbing victims
(i) Compute the proportion of males and females with defence wounds observed at
post-mortem.
(ii) Perform an appropriate hypothesis test to determine if there is any evidence of
a difference between the behaviour of males and female victims.
(iii) Write a sentence to explain the confidence interval reported with the hypothesis
test output in (ii), in the context of this problem.
(iv) How does this confidence interval agree with the results of the hypothesis test in
(ii).
Question 4
The Workforce dataset shows the labour force participation rates for males and females
over time for several countries in Europe and North America. The data for this example
were taken from http://wdi.worldbank.org/table/2.2 and records the labour force
participation rates for males and females over time by country.
(i) Perform a hypothesis test to determine if there has been a change in rates of
females participating in the labour force across both continents from 2000 to
2013.
(ii) Perform a hypothesis test to determine if there is evidence of a difference in female
labour force participation rates in 2013 between Europe and North America.
(iii) Comment on the significance of the difference seen in (ii).
Outline Solutions
Question 1
(i) Use Graph > Boxplot > Multiple Y’s to get the graph shown in Figure 1.
Figure 1: Boxplot of social media followers
It would appear from this graph that the tennis players have more Facebook fans
than Twitter followers.
(ii) Since this is paired data, i.e. Facebook fans and Twitter followers both refer to
the same tennis player, a paired t-test is appropriate. The output from this is
shown below:
Paired T-Test and CI: Facebook fans, Twitter followers
Paired T for Facebook fans - Twitter followers
Facebook fans
Twitter followers
Difference
N
36
36
36
Mean
1982102
1070881
911221
StDev
4052620
1758757
3022743
SE Mean
675437
293126
503791
95% CI for mean difference: (-111528, 1933970)
T-Test of mean difference = 0 (vs 0): T-Value = 1.81
P-Value = 0.079
Since p = 0.079, do not reject NH and conclude that there is no evidence of a
difference in the mean number of Facebook fans and Twitter followers.
Question 2
(i) Several of the Twitter followers information is missing. It would seem reasonable
to assume this is because these individuals do not have Twitter accounts.
(ii) Since the data is stacked in this file, in Minitab use the commands Graph >
Boxplot > One Y and select With Groups to produce the boxplot shown in Figure
2.
Figure 2: Boxplot of social media followers by profession
This plot indicates actors have more total social media followers than tennis
players.
(iii) Since actors and tennis players are different groups of people, a two-sample t-test
is appropriate to compare the mean numbers of social media followers. Since the
data is stacked, the simplest approach in Minitab is to use the Both samples
are in one column option in the command box. The output is shown below:
Two-Sample T-Test and CI: Total, Profession
Two-sample T for Total
Profession
Actor
Athlete - Tennis
N
40
40
Mean
32018482
2844446
StDev
24307356
5234887
SE Mean
3843330
827708
Difference = (Actor) - (Athlete - Tennis)
Estimate for difference: 29174036
95% CI for difference: (21240051, 37108021)
T-Test of difference = 0 (vs ): T-Value = 7.42
P-Value = 0.000
DF = 42
Since p < 0.001, reject NH and conclude that there is evidence to suggest that
actors have significantly more social media followers than tennis players.
Question 3
(i) For males the proportion is 57/162 = 0.352, or 35.3% and for females, 18/33 =
0.545, i.e. 54.5%.
(ii) The appropriate test is a Z-test for two proportions. In Minitab this can be done
using Stat > Basic Statistics > 2 Proportions then selecting Summarized
data from the drop-down menu. ‘Events’ are the number of observations of
defence wounds (57 and 18) and ‘trials’ and the numbers of males and females
(162 and 33 respectively). This gives the output shown below:
Test and CI for Two Proportions
Sample
1
2
X
57
18
N
162
33
Sample p
0.351852
0.545455
Difference = p (1) - p (2)
Estimate for difference: -0.193603
95% CI for difference: (-0.378722, -0.00848332)
Test for difference = 0 (vs 0): Z = -2.05 P-Value = 0.040
Since p = 0.040, reject the NH and conclude that there is evidence that significantly more females than males have defence wounds as a result of fatal stabbings.
(iii) It is possible to be 95% sure that in the population, between 0.8% and 37.8% more
females than males will exhibit defence wounds as a result of a fatal stabbing.
(iv) Since the confidence interval does not contain zero, this is not a plausible value
for the difference. This agrees with the hypothesis test which indicated that
significantly more females than males would have defence wounds, hence the
difference is not zero.
Question 4
(i) Since the data is recorded over time, the 2000 rates and 2013 rates refer to the
same country so this is paired data. The appropriate hypothesis test is a paired
t-test. The Minitab output is shown:
Paired T-Test and CI: %female 2000, %female 2013
Paired T for %female 2000 - %female 2013
%female 2000
%female 2013
Difference
N
46
46
46
Mean
51.02
53.48
-2.457
StDev
8.85
7.65
3.816
SE Mean
1.31
1.13
0.563
95% CI for mean difference: (-3.590, -1.323)
T-Test of mean difference = 0 (vs 0): T-Value = -4.37
P-Value = 0.000
Since p < 0.001, reject NH and conclude that there is evidence of an increase
in the rates of females participating in the labour force over time across both
Europe and North America.
(ii) The comparison between the two different continents is a two-sample t-test since
these are independent samples. Since the data is stacked, the appropriate option in Mintiab is Both samples are in one column from the t-test drop down
menu. This gives the output shown:
Two-Sample T-Test and CI: %female 2013, Continent
Two-sample T for %female 2013
Continent
Europe
North America
N
36
10
Mean
52.36
57.50
StDev
7.35
7.71
SE Mean
1.2
2.4
Difference = (Europe) - (North America)
Estimate for difference: -5.14
95% CI for difference: (-11.03, 0.75)
T-Test of difference = 0 (vs ): T-Value = -1.88
P-Value = 0.082
DF = 13
Although the mean is higher in North America than in Europe, p = 0.082 indicates that it is reasonable to conclude that this difference occurred by chance.
There is no evidence at the 5% significance level of the rates of females participating in the labour force in 2013 being different between the two continents.
(iii) The difference in between North America and Europe may become significant if
data from more countries within each continent was included.
References
[1] Katkici, Ü., Özkök, M.S. and Örsal, M. (1994) An autopsy evaluation of defence
wounds in 195 homicidal deaths due to stabbing. Journal of the Forensic Scitnce
Society, 34(4): 237-240.
Download