CHAPTER 24 - Village Christian School

advertisement
CHAPTER 24
INFERENCE:
Comparing Means
1
Comparing Two Means
Two-Sample Problems
♦
The goal of this inference is to compare
difference in the means of two different
groups; we may wish to compare the
responses to two treatments or to compare
the characteristics of two populations. For
these problems, we use a two-sample t-test
or a two-sample t-interval. It is important
to note that there needs to be a separate
sample from each treatment or each
population.
Comparing Two Means
Assumptions and Conditions for Comparing
two means
♦
Independence
• Randomization: Two random samples from two distinct
populations.
• 10% Condition: Both samples are less than 10% of the
population
♦
Normality
• Nearly Normal Condition: Both populations are normally
distributed.
♦
Independent Groups
• Distinct Groups: The two samples are independent of one
another; that is, there is nothing (or no one) in both
groups; also, one sample has no influence on the other.
Two-Sample t Procedures
In order to calculate the confidence interval or
the test statistic, we need to use the Standard
Error for the difference in the means. Don’t
forget: VARIANCES ADD!
SD( y1  y2 )  Var ( y1 )  Var ( y2 )
2
 1    2 
 

 
 n   n 
 1  2

SE( y1  y2 ) 
 12
n1

 22
n2
s12 s22

n1 n2
2
Two-Sample t Procedures
Draw an SRS of size n1 from a normal
population with unknown mean µ1, and draw
an independent SRS of size n2 from another
normal population with unknown mean µ2.
The confidence interval (CI) for µ1 - µ2 given by
 y1  y2   t *
s12 s22

n1 n2
has confidence level at least C no matter what
the population standard deviations are for
either population.
Two-Sample t Procedures
For a significance test, we let t* be the upper
(1 – C) / 2 critical value for the t(k)
distribution with df = k.
To test the hypothesis Ho: µ1 – µ2 = 0, compute
the two-sample t statistic
t
 y1  y2   1   2 
 s12 s2 2 


n  n 
2 
 1
and use P-values or critical values for the t(k)
distribution.
Two-Sample t Procedures
k is degree of freedom for a two-sample t-test
where the df of the smaller of (n1 – 1) and
(n2 – 1). Here is the actual formula:
2
s s 
  
n1 n2 

df 
2 2
2 2
1  s1 
1  s2 
  
 
n1  1  n1  n2  1  n2 
2
1
2
2
But most people agree to either use k = the
smaller of (n1 – 1) or (n2 – 1) or for the most
part we let the calculator deal with this
formula.
Harder Working Hearts
Resting pulse rates for a random sample of 26
smokers had a mean of 80 beats per minute
(bpm) and a standard deviation of 5 bpm.
Among 32 randomly selected nonsmokers, the
mean was 74 bpm and the standard deviation
was 6 bpm. Both sets of data were roughly
symmetric and had no outliers. Is there
evidence of a difference in the mean pulse rate
between smokers and nonsmokers? If so, how
big?
Harder Working Hearts
Step 1: Identify population Parameter, state
the null and alternative Hypotheses,
determine what you are trying to do (and
determine what the question is asking).
♦ We wish to determine if there is evidence of
a difference in mean pulse rate between
smokers and nonsmokers. Let s represent
smokers and n represent non-smokers
• Null Hypotheses: H0: μs - μn = 0
» There is no difference in pulse rates.
• Alternative Hypotheses: HA: μs - μn ≠ 0
» There is a difference in pulse rates
Harder Working Hearts
Step 2: Verify the Assumptions by checking
the conditions
♦ Independence:
• Randomization Condition: We are told that
both samples were a random sample.
• 10% Condition: We have less than 10 % of all
smokers and nonsmokers
• There is no reason to doubt independence.
Harder Working Hearts
Step 2: Verify the Assumptions by checking
the conditions
♦ Normality:
• We are told that both sets of data are unimodal
and symmetric with no outliers, so it is safe to
assume that the sampling distribution of both
groups are approximately normal.
♦
Independent Groups:
• Data comes from two distinct populations,
smokers and nonsmokers.
Harder Working Hearts
Step 3: If conditions are met, Name the
inference procedure, find the Test statistic,
and Obtain the p-value in carrying out the
inference:
Name the test: We will use a Two-Sample T-test
ns = 26 nn = 32
ys  80 yn  74
Ss = 5
Sn = 6
(80  74)  0
Test Statistic: t 
 4.15, df  56
2
2
5
6

26 32
Obtain the p-value: p  value  0.0001
Harder Working Hearts
Step 4: Make a decision (reject or fail to reject
H0). State your conclusion in context of the
problem using the p-value – make sure you
relate your solution to the population mean!
♦
Such a small p-value, .0001, makes it unlikely that
we get such a difference in the means from
sampling error, so we reject the null hypothesis.
There is strong evidence that there is a difference in
pulse rates between smokers and nonsmokers.
How Much More?
Determine the true difference in mean pulse
rate between smokers and nonsmokers with
99% confidence.
♦
Step 1: State what you want to know in terms of
the Parameter and determine what the question is
asking
• We want to find an interval that is likely, with
99% confidence, to contain the true difference in
mean pulse rates, μs – μn, of smokers and nonsmokers. Let s represent smokers and n
represent non-smokers.
How Much More?
Determine the true difference in mean pulse
rate between smokers and nonsmokers with
99% confidence.
♦
Step 2: Verify the Assumptions by checking the
conditions
All assumptions and conditions were satisfied in
the previous problem.
How Much More?
Determine the true difference in mean pulse
rate between smokers and nonsmokers with
99% confidence.
♦
Step 3: Name the inference, do the work, and
state the Interval:
Name the test: This is a Two-Sample T-Interval
Interval: (2.148, 9.852)
How Much More?
Determine the true difference in mean pulse
rate between smokers and nonsmokers with
99% confidence.
♦
Step 4: State your Conclusion in context of the
problem
• We are 99% confident that the true difference in
pulse rates between smokers and nonsmokers is
between 2.148 and 9.852. In other words, we are
99% confident that smokers have a pulse rate
between 2.148 and 9.852 bpm higher than
nonsmokers.
Pizza, Pizza!!!
Nutritional information from two different
national chains, Papa Johns and Dominos,
were examined to determine the amount of
saturated fat (in grams) in one slice of various
pizzas. Use the data below to determine if
there is a difference in the two chains in the
amount of saturated fat that slices of pizzas
contain. The following table represents
saturated fat (in grams) per a slice of pizza:
P
6
6
8
6
8
7
4
7
6
9
6
5
5
7
4.5
D
17
8
12
12
10
15
8
7
8
11
10
11
10
13
5
13
16
11
16
12
Pizza, Pizza!!!
Step 1: Identify population Parameter, state
the null and alternative Hypotheses,
determine what you are trying to do (and
determine what the question is asking).
♦ We want to know if the two pizza chains
have significantly different mean saturated
fat contents. Let P represent Papa John’s
and D represent Dominos
• H0: μP - μD = 0
» There is no difference in mean saturated fat content.
• HA: μP - μD ≠ 0
» There is a difference in mean saturated fat content
Pizza, Pizza!!!
Step 2: Verify the Assumptions by checking
the conditions
♦ Independence:
• Randomization Condition: We are not told if
the samples were randomly selected. We will
assume that the pizzas were representative of
the population. If not representative, our results
may not be valid.
• 10% Condition: It is safe to assume that we
have less than 10% of all pizza slices.
Pizza, Pizza!!!
Step 2: Verify the Assumptions by checking
the conditions
♦ Normality:
• Both samples are relatively small, so we look at
the sample distributions:
Papa John’s
Domino’s
It is safe to assume normality, since both
samples are unimodal symmetric
♦
Independent Groups:
• Data comes from two distinct populations, Papa
John’s and Domino’s.
Pizza, Pizza!!!
Step 3: If conditions are met, Name the
inference procedure, find the Test statistic,
and Obtain the p-value in carrying out the
inference:
Name the test: We will use a Two-Sample T-test
nP = 14
nD = 20
yP  6.393 yD  11.250
SP = 1.389 Sn = 3.193
(6.393 11.250)  0
Test Statistic: t 
 6.035, df  28
2
2
1.393 3.193

14
20
Obtain the p-value: p  value  0.000001
Pizza, Pizza!!!
Step 4: Make a decision (reject or fail to reject
H0). State your conclusion in context of the
problem using the p-value – make sure you
relate your solution to the population mean!
♦
The p-value is extremely small, .000001, so we reject
the null hypothesis. There is very strong evidence
that there is a difference in saturated fat content
between Papa John’s and Domino’s.
To T or Not to T, That is the Question
Sometimes, you may wonder if you
should use t or z. If you know σ, use z
(this is very rare and almost never
happens in the real world). Whenever
you use s to estimate σ, use t.
What about pooling?
♦
If we know that the variances are equal (or
willing to assume this), we pool the two
groups; otherwise don’t pool difference in
means.
Assignment
Chapter 24
Lesson:
Comparing Means
Read:
Chapter 24
Problems:
1 - 33 (odd)
Download