Uploaded by YiFei Wang

Module 6 Two sample tests

advertisement
Module 6
Simple comparative experiments: two sample inference
Difference of two Proportions - test
Fifty seven percent of 248 boys sampled aged 15-17 have online profiles while 70% of 256 sampled girls
aged 15-17 have online profiles. If the boys and girls were selected at random, is there a statistical
difference between the two groups? Use = 0.05.
Framework:
Is the difference between the
proportions significantly
different from 0?
1) Hypotheses:
H 0 : p1 − p2 = 0 or p1 = p2
H A : p1 − p2  0 or p1  p2
2) Model and Assumptions:
• Random and independent data
• 10% of population
• Independent Samples (yes/no)
• Enough successes (check both groups…57% of 248 and 70% of 256)
Two sample proportions (z) test
Sampling distribution of 𝑝̂1 − 𝑝̂2 .
Use z (Normal) with a centre at p1- p2 =0 and standard error of:
se( pˆ1 − pˆ 2 ) =
pˆ qˆ
pˆ1 − pˆ 2
The test statistic is:
z=
( pˆ 1 − pˆ 2 ) − 0
pˆ qˆ
+
pˆ qˆ
n n
1
2
1 | ©2020 Karen Lawrence, McMaster University
pˆ qˆ
n n
1
0
+
2
pooled pˆ =
success1 + success2
n1 + n2
pooled pˆ =
141 + 179
= 0.635
248 + 256
H0:p1=p2
This is the “status quo”.
Under this assumption
(condition) the two
samples come from
populations with similar
(or the same) proportions
of success. Therefore we
can pool their estimates
and get a better estimate.
ENGTECH 2ES3/3ES3
3) Mechanics:
Distribution Plot
Normal, Mean=0, StDev=1
z=
( −0.13) − 0
= −3.05
(0.635)(0.365) (0.635)(0.365)
+
248
256
0.4
Two-tailed test
0.3
Density
pˆ 1 = 0.57 boys
pˆ 2 = 0.70 girls
pˆ 1 − pˆ 2 = −0.13
0.2
0.1
0.001144
0.0
-3.05
0.001144
0
X
P-value = 2 x 0.001144 = 0.002
4) Decision and Conclusion
P-value < …..0.002 < 0.05….therefore we
reject H0.
H 0 : p1 − p2 = 0 or p1 = p2
H A : p1 − p2  0 or p1  p2
Conclusion…We have sufficient evidence at
0.05 significance that there
is a difference in the proportion of girls who
have an online profile and boys who have
and online profile.
Okay, we have shown there is a difference.
Now, estimate what the difference could
be…calculate a C.I.
2 | ©2020 Karen Lawrence, McMaster University
ENGTECH 2ES3/3ES3
3.05
Difference of two Proportions – Confidence Intervals
Remember
Estimate +/- margin of error
Estimate +/- (model critical value*) x (standard error of the estimate)
( pˆ1 − pˆ 2 )  z 
pˆ1qˆ1
n
1
− 0.13  (1.96) 
+
pˆ 2 qˆ2
n
not pooled p̂
2
(0.57)(0.43) (0.70)(0.30)
+
248
256
- 0.13  0.083
(-0.213 to − 0.047)
For a C.I., there are no hypotheses
and no parameters, just sample
statistics.
Therefore, we cannot pool sample
estimates.
Note that the interval calculated
here does not contain 0. How does
this relate to the test? It means
“0” cannot be one of the possible
values of the difference in
proportions, so H0 would be
rejected.
This interval, calculated from our sample, gives us 95% of the possible values of pˆ1 − pˆ 2
We are 95% confident the interval of -21.3% and -4.7% captures the true difference in proportions
between boys and girls who have online profiles.
3 | ©2020 Karen Lawrence, McMaster University
ENGTECH 2ES3/3ES3
Difference of two Means – Hypothesis Tests (3 tests)
An engineer is interested in determining if the addition of a polymer latex emulsion during the mixing
process impacts bonding strength of cement (=0.05). Ten samples of the original and 10 samples of
the modified formulations were prepared (two treatments/levels of the factor formulation). The
response, strength, is tabled below.
Summary Statistics (estimates of parameters) from the sample data:
Formulation 1
Modified
Formulation 2
Unmodified
y1 = 16.76
y2 = 17.04
S12 = 0.100
S22 = 0.061
S1 = 0.316
S2 = 0.248
n1 = 10
n2 = 10
4 | ©2020 Karen Lawrence, McMaster University
ENGTECH 2ES3/3ES3
How the Two-Sample t-Test Works:
•
Is there a difference in the sample means?
•
Is this difference significantly different from 0? (H )
•
We need to test based on the sampling distribution of 𝑦̅1 − 𝑦̅2 :
y1 − y2 = 16.76 − 17.04 = −0.28
H 0 : 1 − 2 = 0 or 1 = 2
H A : 1 − 2  0 or 1  2
0
1) Hypotheses
H 0 : 1 − 2 = 0 or 1 = 2
H A : 1 − 2  0 or 1  2
2) Model and assumptions
• Random and independent data
• 10% of population
• Nearly normal population {small samples!}
• Independent Samples?* (yes/no)
• Equal population variances? { 𝜎12 − 𝜎22 }** (yes/no)
 =
2
2
1
2
*If answer is no, Test 3.
**If answer is no, Test 2.
1.
Two-sample pooled t-test – independent samples and equal variances
For the sampling distribution of y1 − y2 when 1 and 2 are unknown use t-distribution with a
centre at −2 =0 and standard error of:
se( y1 − y2 ) =
2
2
1
2
1
2
s +s
n n
=
Because we are assuming equal population variances {  
2
2
1
2
we “pool” the two sample standard deviations (s1 and s2)
together to get one estimate of s called sp.
(n1 − 1) s1 + (n2 − 1) s2
2
sp =
n1 + n2 − 2
5 | ©2020 Karen Lawrence, McMaster University
2
},
The pooled standard deviation is a
weighted average of the two sample
standard deviations. Calculation is
made with variances and then square
rooted.
Keep this idea in your mind…it
occurs repeatedly!!
ENGTECH 2ES3/3ES3
Replace s1 and s2 with sp and the standard error then becomes:
se( y1 − y2 ) = s p
test
statistic is
The testThe
statistics
is therefore:
t0 =
1
+
1
n n
1
y1 − y2
1 1
Sp
+
n1 n2
2
The degrees of
freedom (df) are
determined by the
denominator of sp.
df = n1+n2 - 2
3) Mechanics…using the summary statistics from the sample:
(n1 − 1) S12 + (n2 − 1) S 22 9(0.100) + 9(0.061)
S =
=
= 0.081
n1 + n2 − 2
10 + 10 − 2
Distribution Plot
T, df=18
2
p
0.4
S p = 0.284
Density
y1 − y2
16.76 − 17.04
t0 =
=
= −2.20
1 1
1 1
Sp
+
0.284
+
n1 n2
10 10
0.3
0.2
0.1
0.02055
0.0
The two sample means are a little over two standard deviations apart
Is this a "large" difference?
6 | ©2020 Karen Lawrence, McMaster University
0.02055
-2.2
0
2.2
t
P-value = 0.042 {two-sided test.}
ENGTECH 2ES3/3ES3
4) Decision and conclusion
p-value < 
0.042 < 0.05
Reject H0.
H 0 : 1 −  2 = 0 or 1 =  2
H A : 1 −  2  0 or 1   2
Conclusion: We have sufficient evidence at
0.05 significance that there is a difference in
the average bonding strength of the two
mixes.
2. The Two-Sample t-Test - not pooled
For independent samples but do not assume  1
2
= 2
2
:
1) Hypotheses:
H 0 : 1 − 2 = 0 or 1 = 2
=0.05
H A : 1 − 2  0 or 1  2
2) Model and Assumptions:
• Random and independent data
• 10% of population
• Nearly normal population {small samples!}
• Independent Samples?* (yes/no)
•
1 = 2
2
Equal population variances? {
2
 
2
2
1
2
}** (yes/no)
Two-sample t-test
For the sampling distribution of y1 − y2 when 1 and 2 are unknown use t-distribution with a
centre at  -2 =0 and standard error of:
se( y1 − y2 ) =
2
2
1
2
1
2
s +s
n n
7 | ©2020 Karen Lawrence, McMaster University
ENGTECH 2ES3/3ES3
Because we are not assuming equal population variances, cannot pool the sample standard
deviations. The test statistic uses the standard error as stated above.
The test statistics is therefore:
t=
( yˆ 1 − yˆ 2 ) − 0
2
2
1
2
1
2
s +s
n n
with df = yuck!! {we’ll use Minitab}
3) Mechanics
t=
( −0.278) − 0
0.3162 0.2482
+
10
10
= −2.19
4) Decision and Conclusion
p-value < 
0.043 < 0.05, therefore we reject H0.
H 0 : 1 − 2 = 0 or 1 = 2
H A : 1 − 2  0 or 1  2
Conclusion…We have sufficient evidence at 0.05
significance that there is a difference in the average
bonding strength of the two mixes.
8 | ©2020 Karen Lawrence, McMaster University
ENGTECH 2ES3/3ES3
Difference of two Means (independent samples)– Confidence Intervals
General form of a CI: estimate +/- margin of error
3. Paired t-test
Two different machines were used to measure the tensile strength of synthetic fiber. Do the two
machines yield the same average strength values? Eight (8) specimens of fiber are randomly selected
and one measurement is made using each machine on each specimen. These data are paired to
prevent the difference in specimens from affecting the difference in machines.
When samples are paired (dependent), we look at the differences in strength from the machines from
each individual (specimen). The difference (d) is analysed as a one sample t-test.
d = −1.38
2
s d = 7.13 ( sd = 2.67)
1) Hypotheses:
H 0 : d = 0
H A : d  0
=0.05
9 | ©2020 Karen Lawrence, McMaster University
Framework:
On average, are the individual
differences significantly
different from 0?
ENGTECH 2ES3/3ES3
2) Model and Assumptions:
• Random and independent data
• 10% of population
• Nearly normal population {small samples!}
• Independent Samples* (yes/no)….samples are dependent
• Equal population variances
Paired t-test
̅ follows a t-distribution with a centre at d =0 and standard error of:
Sampling distribution of 𝒅
sed =
sd
n
The test statistics is therefore:
t=
d
with df = n-1
sd
n
3) Mechanics
t=
− 1.38
= −1.46
2.67
8
4) Decision and conclusion
p-value >…0.187 > 0.05, therefore we fail to reject H0.
H 0 : d = 0
H A : d  0
Conclusion…at 5% there is insufficient evidence to indicate
that the two machines differ in their mean tensile strength
measurements.
Means (paired samples)– Confidence Interval
10 | ©2020 Karen Lawrence, McMaster University
ENGTECH 2ES3/3ES3
A few text questions:
Chapter 19 {Comparing Means}
11 | ©2020 Karen Lawrence, McMaster University
ENGTECH 2ES3/3ES3
Chapter 20 {Paired Samples}
12 | ©2020 Karen Lawrence, McMaster University
ENGTECH 2ES3/3ES3
13 | ©2020 Karen Lawrence, McMaster University
ENGTECH 2ES3/3ES3
Chapter 21 {Two Proportions}
14 | ©2020 Karen Lawrence, McMaster University
ENGTECH 2ES3/3ES3
15 | ©2020 Karen Lawrence, McMaster University
ENGTECH 2ES3/3ES3
16 | ©2020 Karen Lawrence, McMaster University
ENGTECH 2ES3/3ES3
17 | ©2020 Karen Lawrence, McMaster University
ENGTECH 2ES3/3ES3
Download