Chapter 19 - Two sample tests on the mean

advertisement
Chapter 19 – Two – Sample Problems
In Chapter 9, you studied randomized comparative experiments and the principles of sound
experimental design (randomization, comparison and repetition). Comparing two random samples
separately from two populations is a two – sample problem and is the focus of chapter 19 and part of
chapter 20.
One versus two Samples
When testing a hypothesis, we ask if the sample mean is significantly different from some particular
value. For example, what if tutored students had significantly higher SAT scores than the average? Even
if this was the case you should ask yourself some questions.


Are there other factors that make tutored students different from the rest of the population?
Remember lurking variables. Just the fact that they signed up for tutoring suggests they are
different from other students. Can you think of reasons why this group would be different?
In practice, you need to also have a control group and compare the two samples.
Comparing Two Samples (Chapter 19)
Suppose you want to compare the means of two groups, but they are not matched pairs (no pairing of
individuals). In this case, the samples can be different sizes. We are comparing the means of the two
groups and we can assume that:



Each group is a SRS from two distinct populations
Responses in each group are independent of those from the other group
Both populations are normally distributed. The mean and standard deviation from the
populations are unknown. It is enough that the distributions have similar shapes and that the
data have no strong outliers.
We will call the variable 𝑥1 in the first population and 𝑥2 in the second population because these
variables might have different distributions in the two populations.
Population 1
Population 2
Sample 1
Sample 2
Variable Mean Standard deviation size
𝑥1
𝜇1
𝜎1
𝑥2
𝜇2
𝜎2
𝑥̅1
𝑠1
𝑛1
𝑥̅2
𝑠2
𝑛2
We will use the sample means and standard deviations to estimate the unknown parameters. We want
to compare the two population means either by giving a confidence interval for their difference 𝜇1 − 𝜇2
or by testing the hypothesis of no difference, 𝐻0 = 𝜇1 − 𝜇2 .
Goal: To estimate 𝜇1 − 𝜇2 . To do this we will use the difference between the means of the two samples
(𝑥̅1 − 𝑥̅2 )
Ex pg 468
People gain weight when they take in more energy from food then they expend. So to investigate the
link between obesity and energy spent. Twenty healthy volunteers who do not exercise are chosen.
Ten are lean and ten are mildly obese but still healthy. The following table gives data on the amount of
time (in minutes per day) that the subjects spend standing or walking, sitting or lying down:
What are the null and alternative hypotheses?
𝐻0 : 𝜇1 = 𝜇2 (both groups have same mean standing and walking time)
𝐻𝑎 : 𝜇1 > 𝜇2 (the lean group are more active than the obese group)
Note: Have the conditions of inference met? Since the subjects are volunteers, this is not a SRS. But
(read text) the study did take precautions to that we can assume that the two groups are independent
SRSs.
Calculating the group means yields the following.
This gives
.
Now we need to learn the details of inference comparing two means.
Two – Sample t procedures
Is this observed difference surprising? This depends on the spread of the observations as well as the
two means. Widely different means can occur by chance so we need to take variation into account. So
we need to standardize the observed difference 𝑥̅1 − 𝑥̅2 by dividing by its standard deviation.
So this standard deviation is
But we don’t know the population standard deviation so we use the sample standard deviation. This is
called the standard error, or estimated standard deviation, of the differences in the sample means:
We then standardize the estimate by dividing by the standard error. This is the two – sample t statistic:
The interpretation of this is the same as any z or t statistic. It tells us how far 𝑥̅1 − 𝑥̅2 is from 0 in terms
of standard deviations.
The Two – Sample t Procedures:
To test the hypothesis Ho: µ1 = µ2, calculate the two-sample t statistic
Find P-values from the t distribution with degrees of freedom from either Option 1 (software)
or Option 2 (the smaller of n1 − 1 and n2 − 1).
Ex Daily Activity and Obesity continued
The two – sample t statistic comparing the average minutes spent walking and standing in the 2 groups
(lean vs. obese):
Next we need to compute the degrees of freedom. Because both n1 − 1 = 9 and n2 − 1 = 9, there are 9
degrees of freedom. Because 𝐻𝑎 is one – sided, the P – value is the area to the right of 𝑡 = 3.808 under
the t curve with df = 9. You can either use Table C or the calculator to compute this.
Using table:
So 0.001 < 𝑃 < 0.0025
Using the calculator (preferred):
You can either enter the data in List 1 and 2 on your calculator or use the means and standard
deviations given below the table (which was obviously computed using the calculator).
Go to 2 – sample 𝑡 test (under Stats/Tests) and enter the means and standard deviations and the
number in each sample. You will be asked if the samples are “pooled”. If “pooled” this means the 2
populations have the same variance. Since we have no way of knowing what the population variance is,
we cannot accurately answer this question. “Unpooled” works if the variances are the same or not so
always use “unpooled”. This is discussed on page 487 in the text.
So you get 𝑡 = 3.81, 𝑝 = 8.414 ∗ 10−4 = .000841
So what does this tell us? There is very strong evidence (𝑃 = 0.0008) that lean people spend more
time walking and standing than moderately obese people.
Problem: Are the following two schools comparable in SAT scores? (or are the scores different?)
School 1: A random sample of 43 students has a mean SAT of 502 and a standard deviation of 60.
School 2: A random sample of 35 students has a mean SAT of 480 and a standard deviation of 50.
Step 1: What are the null and alternative hypotheses?
Step 2: Calculate the test statistic:
Step 3: Compute the P – value
Step 4: Conclude
Step 5: Interpret the P – value
Confidence Intervals for Two Sample Means
Draw an SRS of size n1 from a large Normal population with unknown mean µ1, and draw an
independent SRS of size n2 from another large Normal population with unknown mean µ2. A level C
confidence interval for µ1 − µ2 is given by
Here t* is the critical value for confidence level C for the t distribution with degrees of freedom
from either Option 1 (software) or Option 2 (the smaller of n1 − 1 and n2 − 1).
Example 18.4 How much more active are lean people?
Give a 90% confidence interval for 𝜇1 − 𝜇2 , the difference in average daily minutes spent standing and
walking between lean and moderately obese adults.
Using 9 degrees of freedom, the critical value is 𝑡 ∗ = 1.833 (from Table C) or using 𝐼𝑛𝑣𝑇 under
Distribution on calculator. 𝐼𝑛𝑣𝑇(0.95, 9) = 1.833. Why did I use 0.95 instead of 0.90?
Conclusion: So we are 90% confident that the actual difference in average daily minutes spent standing
and walking between mean and mildly obese individuals is between 79.09 and 225.87 minutes.
Note: This is quite wide is because the samples are small and the variation among individuals is big.
Problem: Find a 95% confidence interval for the difference in population means between the 2 schools.
What does this confidence interval tell us?
Download