ACTIVITY SET 13

advertisement
Hypothesis Testing – Comparing Two Groups - Solutions
1 For each of the following research questions does the situation or research question involve
independent samples or paired data?
a. Twenty-five people have their cholesterol measure before eating a Big Mac and again after
eating a Big Mac. On average, does eating a Big Mac increase cholesterol?
Paired data – the measurements will be taken twice on the same 25 subjects
b. What is the difference in average ages at which teachers and plumbers retire?
Two independent samples
c. What is the difference in average salaries for high school graduates and college graduates?
Two independent samples
d. In fifty married couples, the husband and wife each separately take the same test of marital
satisfaction. Is there a difference, on average, between the scores of husbands and wives?
Paired data – spousal data is often analyzed as paired data.
2 In the Datasets folder open the GSS Dataset. The data are from the 2006 General Social
Survey, a federally funded national survey done every other year by the University of Chicago.
The variable marital indicates whether the respondent is presently married or not. We’ll compare
the mean amount of television watching per typical day (tvhours is the variable) for those who
are married versus those who are not..
a. In words, write a null hypothesis for this situation. We’re comparing two means (television
watching for married people versus unmarried people).
Null: no difference in mean television watching for married people and unmarried people
b. Using statistical notation for means write null and alternative hypotheses for this problem.
H0: μ1 – μ2 = 0 or equivalently H0: μ 1 = μ 2
Ha: μ1 – μ2 ≠0 or equivalently H0: μ 1 ≠ μ 2
c. Recall from the lecture notes that when doing a two-sample t-test one consideration is whether
the two standard deviations (or variances) are equal. To check, use software to find the standard
deviation for tvhours for the two categories of marital status. In Minitab this can be done by
going to Stat > Basic Statistics > Display Descriptive Statistics. Select tvhours for the
Variables and then for the By Variable enter marital status.
i. What are the two standard deviations?
Std.dev for married: 1.8241
Std.dev for not married: 2.693
ii. Is the larger standard deviation more than twice the smaller standard deviation?
No, 2.693 is not more than twice 1.8241
1
iii. If your answer to part ii is “Yes” then we will use the unpooled method for calculating
the standard error. If your answer was “No” then we can use the pooled method. Which method
should we use? Pooled
d. The two-sample t-test is used to compare means when data is from two independent
samples (as it is here). Use Minitab to conduct a 2-sample t test by going to Stat > 2sample t. Enter tvhours in the Samples column and marital status for Subscripts. If
your answer to part ii above is “NO” use pooled then click the box for “Assume Equal
Variances”. Click Options and make sure the Alternative is the one you used in writing
your alternative hypothesis in part b. Read the output to find the values of the t-statistic
and the p-value.
t=
p-value = 0.000
4.19
e. State a conclusion about the hypotheses and about the “real world” situation.
Reject the null. Conclude population mean tv hours differs for the two groups. It looks like
the mean is higher for unmarried.
f. The formula for the pooled t-statistic is t 
in the formula.
x 1  3.28
x 2  2.61
x1  x2
. Give values for each of the elements
1 1
sp

n1 n2
s p  2.3524
n 1  496
n 2  387
g. The output includes a 95% confidence interval for the difference between means. Write a
sentence that interprets this interval in terms of how much difference there is mean television
watching for the two groups.
We are 95% confident that the difference in means for the two group is somewhere between 0.36
and 0.98 hours per day (with not married having a higher mean)
h. Refer again the to the 95% confidence interval of the previous part. Explain why it is evidence
that makes it reasonable to conclude that the population means differ.
The interval does not include 0 so we can reject no difference as a possibility.
3 In a national survey of 12th graders, 254 of 1356 boys said they never or rarely wear a seatbelt
when driving. Among 1168 girls, 97 said they never or rarely wear a seatbelt when driving.
a. Let p1 = population proportion that never or rarely wears a seatbelt for boys and p2 = the
corresponding proportion for girls. Write null and alternative hypotheses about p1 and p2 for
testing if a difference exists.
H0: p1 – p2 = 0 or equivalently H0: p1 = p2
Ha: p1 – p2 ≠ 0 or equivalently Ha: p1 ≠ p2 or it’s okay to use H0: p1 – p2 <0 implying we think girls
are less likely to never wear a seatbelt
2
b. If using Minitab (you can use the Start menu to do this), then use Stat>Basic Stats> 2
proportions. Click on Summarized data. Use the boys as the first sample and girls as the second
sample. “Number of trials” means sample size and “Number of events” means number rarely or
never wearing a seatbelt. Next click Options and select your alternative and enter your test
difference. If your hypotheses statements are testing that the difference is 0, then be sure to select
the Use Pooled Estimate of p. Use the output to give values for the following:
For boys, sample proportion = p̂1 = .187
For girls, sample proportion = p̂ 2 = .083
The difference between the sample proportions is p̂1  p̂ 2 = .104
Value of z-statistic = 7.55
p-value = 0.000
c. Explain whether we can we say there is a difference between the population proportions in this
situation.
We can conclude that there is a difference (p-value =0.000 is less than 0.05)
4 In the Datasets folder click the link for the Class Survey. Is there a difference between how
students performed on their SATM and SATV? Since we are considering differences between
two measurements (i.e. SATM and SATV) on the same individual we can consider the data to be
paired. Use Minitab Stat > Basic Statistics > Paired t and enter SATM in First Sample and
SATV in Second Sample to conduct a Matched Paired t-test.
a. Write the null and alternative hypotheses using appropriate statistical notation.
H0: ud = 0
Ha: ud ≠ 0
b. Based on the Confidence interval what is your conclusion?
This interval does not contain 0 so we would reject the
null hypothesis and conclude that a difference does exist in the population between SATM and
SATV scores. Since the bounds of this interval are greater than 0 we could surmise that SATM
scores are higher in the population than SATV.
95% CI for mean difference: (7.4157, 31.5611)
c. Based on your p-value what is your conclusion?
P-Value = 0.002 The p-value
is less than 0.05 so we would reject Ho and draw the same conclusion
as stated above in part b.
d. Use the data from the output to calculate the t-statistic by: t =
x d - d 19.4884 - 0
 3.18
=
s
89.8075
n
215
which matches the t-value from Minitab.
3
Download