Chapter 13 WebNotes

advertisement
Chapter 13: Comparing Two Population
Parameters
Section 13.1: Comparing Two Means
When we learned about designing scientific studies we had a basic format for testing two competing
claims. For example, if a pharmaceutical company comes up with a new medication for lowering
cholesterol and we want to test it, we would compare it to the current medication for lowering cholesterol
using the following basic design:
Group receiving
old drug
Random Allocation
SRS of
Volunteers with
high cholesterol
Compare
Cholesterol
Levels
Random Allocation
Group receiving
new drug
Up to this point we know how to randomly allocate, we know how to block, if necessary, but we don’t
know how to compare. Here is how one would compare two samples using a t-test:
Example: An educational consulting company claims that it developed a new more effective method of
teaching AP Statistics. In order to test the claim, the company chooses a random sample of 40 students
who are interested in taking AP Statistics. It then randomly assigns half those students to a class using
their new method, and half the students to a class using the traditional method. At the end of the course,
after the students take the AP Exam in Statistics, the scores are compared. a) Conduct a hypothesis test
(alpha = 0.05 level) to see if in fact the new method produces better scores than the old method. b)
Construct a 95% confidence interval for the mean difference of scores. Lets say that:
x1  The average score of students in the new methods class = 3.7
x2  The average score of students in the traditional methods class = 3.5
s1 = standard deviation of the score form the first sample = 0.7
s2 = standard deviation of the score from the second sample = 0.6
Part a: Hypothesis testing
Step 1: State the hypothesis
H o : 1   2  0
H a : 1   2  0
Where 1  Average score of students using the new method
 2  Average score of students using the traditional method
Step 2: Assumptions
1) Sample is an SRS. Given
2) Since the sum of our sample sizes is 40 we don’t have to worry about how normal our
population distribution is. (If the sum of the sample sizes were less than 40 we would want
to check to make sure that each sample came from an approximately normal population
with no outliers)
3) The samples need to be independent (see below)
Step 3: Calculate the test statistic and p-value
Keep in mind that our test statistic now involves the distribution of the difference of the two individual
2
2
      2 2
distributions. This new distribution has a mean 1   2 , and a variance of  1    2  = 1  2 .
 n   n 
n1
n2
 1  2
Its standard deviation is therefore
So the 2-sample z-statistic will be:
 12
n1

 22
n2
.
( x1  x2 )  ( 1   2 )
 12
n1

 22
n2
Since we don’t know what  is, we will replace it with the sample standard deviation, s, and use the tstatistic instead of the z-statistic:
t
( x1  x2 )  ( 1   2 )
s12 s 22

n1 n2
In our case t 
(3.7  3.5)  0
0 .7
2
20

0.6
2
= 0.9701
20
The degrees of freedom with a two-sample t-test is the smaller of (n1 - 1) and (n2 - 1)
(note: the calculator and computer software use a more complicated formula for calculating the degrees of
freedom. Our way, though less exact, is more conservative)
So in our case we have 19 degrees of freedom.
P(t>0.9701) with 19 d.f is between 0.15 and 0.20.
Step 4: Conclusion
Since this p-value is not significant to the .05 level, we will not reject the null hypothesis. This would
mean that we do not have enough evidence to say that the new method is any better than the traditional
method for raising AP scores.
Part b) 95% Confidence interval:
Our confidence interval will take the form of:
( x1  x 2 )  t 
s12 s 22

n1 n 2
The t  value for a 95% confidence interval with 19 degrees of freedom is 2.093
Therefore the confidence interval for our situation is:
(3.7 – 3.5)  2.093 
0.7 2 0.6 2

= 0.2  .4315
20
20
(-0.2315, .6315)
Section 13.2: Comparing Two Proportions
Just like we can use hypothesis testing to compare two population means and confidence intervals to
estimate the difference between two population means, we can do the same with the difference of two
population proportions.
Example: Thinking about the upcoming prom, Andrew Negri is pondering whether he should expand his
date opportunities and ask some girls out from Greenwich HS hoping that his chances would be better
there. He conjectures that a higher proportion of GHS seniors attended the prom last year than DHS
seniors. He takes an SRS from each school and gets the following data:
Population
Sample size
DHS Seniors
GHS Seniors
25
45
# of seniors who
attended the prom
last year
15
40
I) Construct a 95% confidence interval of the difference in proportion of GHS seniors who went to the
prom last year and DHS seniors who went to the prom last year.
Assumptions:
1) Samples are SRS from the designated populations (given)
2) The population is at least 10 times as large as the samples.
- For GHS a sample size of 45 is ok, but for DHS a sample size of 25 is somewhat problematic
because there aren’t at least 250 seniors. So we proceed with caution
3)
n1 pˆ 1  5
n1 (1  pˆ 1 )  5
n2 pˆ 2  5
n2 (1  pˆ 2 )  5
check
25(.6)  15  5
25(1  .6)  10  5
45(0.89)  40  5
45(1  0.89)  5  5
Interval construction:
( pˆ 2  pˆ 1 )  z  SE
SE 
pˆ 1 (1  pˆ 1 ) pˆ 2 (1  pˆ 2 )

n1
n2
Where: p1 = proportion of DHS seniors who attended the prom last year
P2 = proportion of GHS seniors who attended the prom last year
(0.89  0.6)  1.96 
.6(1  .6) 0.89(1  0.89)


25
45
0.29  0.212
(0.078, 0.502)
II) Do a hypothesis test (alpha = 0.05) to see if Andrew is correct
Step 1: State the hypothesis
H o : p1  p 2
or
H a : p1  p 2
H o : p1  p 2  0
H a : p1  p 2  0
Where: p1 = proportion of DHS seniors who attended the prom last year
P2 = proportion of GHS seniors who attended the prom last year
Step 2: Assumptions
1) Samples are SRS from the designated populations (given)
2) The population is at least 10 times as large as the samples.
- For GHS a sample size of 45 is ok, but for DHS a sample size of 25 is somewhat problematic
because there aren’t 250 seniors. So we proceed with caution
3)
n1 pˆ 1  5
n1 (1  pˆ 1 )  5
n2 pˆ 2  5
n2 (1  pˆ 2 )  5
check
25(.6)  15  5
25(1  .6)  10  5
45(0.89)  40  5
45(1  0.89)  5  5
Step 3: Calculate test statistic and p-value
z
pˆ 1  pˆ 2
1
1
pˆ (1  pˆ ) 
 n1 n2



where:
p̂1  Proportion of DHS seniors from our sample who attended prom last year
p̂2  Proportion of GHS seniors from our sample who attended prom last year
p̂  Pooled proportion of seniors who attended the prom from both samples combined
p̂ 
count of successes in both samples combined
55

 0.79
count of oservation s in both samples combined 70
The reason we used a pooled proportion for the standard deviation of the difference of proportions is
because in the null hypothesis we assumed the two population proportions are equal. This essentially
means that there is one population (of DHS and GHS seniors) from which a certain proportion
attended the prom the year before. That proportion is p̂ .
z
pˆ 1  pˆ 2
1
1
pˆ (1  pˆ ) 
 n1 n2




0.6  0.89
1 
 1
(0.79)  (1  0.79)  
 25 45 

 0.29
 2.82
.1025
P(Z<-2.82) = 0.0024
Step 4: Conclusion
Since our p-value is less than our alpha level we can reject the null hypothesis and conclude that there
is enough evidence to suggest that the proportion of seniors from GHS who went to the prom last year
is greater than the proportion of seniors from DHS who went to the prom last year.
Download