Hypothesis Testing 2 Samples

advertisement
HYPOTHESIS TESTING
2 Samples
WHY HYPOTHESIS TESTING?
Confirmatory type of test
 Using data from 2 samples to make an inference
we can apply to the population
 Alternative hypothesis



The hypothesis we are testing
Null hypothesis

The “nothing is happening” hypothesis
WHY HYPOTHESIS TESTING?

When our test provides us sufficient evidence, we
can:
Reject the null hypothesis
 Conclude the alternative hypothesis must be true.


Otherwise, we don’t have sufficient evidence to
reject the null hypothesis.
2 SAMPLES: INDEPENDENT POPULATIONS
Negative evaluation scores (self-image). 2
independent groups of women.
 Normal eating habits



n = 14
x-bar = 14.14
s = 3.29
x-bar = 18.96
s = 1.92
Bulimic

n = 12
Note: Higher score means negative self-evaluation
ASSUMPTIONS

Samples come from independent populations


One does not affect the other
Samples come from normal populations

q-q plots
CHECK FOR NORMALITY
Probability Plot of C1
Normal
99
Mean
StDev
N
AD
P-Value
95
90
Percent
80
70
60
50
40
30
20
10
5
1
-2
-1
0
1
C1
2
3
0.2569
0.9510
30
0.309
0.537
CHECK FOR NORMALITY
Probability Plot of C3
Normal
99
Mean
StDev
N
AD
P-Value
95
90
Percent
80
70
60
50
40
30
20
10
5
1
0.00
0.25
0.50
C3
0.75
1.00
1.25
0.5058
0.2857
30
0.573
0.125
CHECK FOR NORMALITY
Probability Plot of C2
Normal
99
Mean
StDev
N
AD
P-Value
95
90
Percent
80
70
60
50
40
30
20
10
5
1
-1
0
1
C2
2
3
1.040
0.6551
30
1.223
<0.005
COMPARING THE 2 MEANS
Comparing means of independent populations
 Begin by creating 2 confidence intervals (.95)

𝑠
95% 𝐶. 𝐼. = 𝑥 ± 𝑡
𝑛

Let’s estimate the difference in the means of the
2 groups
C.I REVIEW

1-α confidence interval on μ
σ
𝑛

σ known 𝐶. 𝐼. = 𝑥 ± 𝑧

σ unknown 𝐶. 𝐼. = 𝑥 ± 𝑡
𝑠
𝑛

In both equations, x-bar is the point estimate for μ

So, let’s use 𝑥 1 - 𝑥2 be the point estimate for μ1 – μ2
ESTIMATING THE CI FOR μ1 – μ2


A linear combination of normal random variables
will give us another normal R.V. So, 𝑥 1 - 𝑥2 will be
normally distributed.
To get our confidence interval:

Point estimate ± Distribution value x S.E mean
𝑥1 − 𝑥2 ± 𝑧∝/2
𝜎12
𝑛1
+
𝜎22
𝑛2
𝑥1 − 𝑥2 ± 𝑡∝/2
𝑠12
𝑛1
+
𝑠22
𝑛2
DIFFERENCES BETWEEN MEANS
𝑑𝑓 =
1
𝑛1 − 1
2 2
2
𝑠1 𝑠2
+
𝑛1 𝑛2
2 2
𝑠1
1
+
𝑛1
𝑛2 −
1
2 2
𝑠2
𝑛2
95% CONFIDENCE INTERVAL:
DIFFERENCE BETWEEN MEANS

Negative evaluation scores (self-image). 2
independent groups of women.
Normal eating habits
 n = 14
x-bar = 14.14
Bulimic
 n = 12
x-bar = 18.96
s = 3.29
s = 1.92
Estimated df = 21
 Estimate the difference in means with a 95%
confidence level

95% CONFIDENCE INTERVAL:
DIFFERENCE BETWEEN MEANS
14.14 −18.96
1.922 3.292
± 2.08
+
12
14
= -4.82 ± 2.08(1.039)
= -4.82 ± 2.16
Or, -2.66 to -6.98
HYPOTHESIS TEST ON
DIFFERENCE BETWEEN MEANS

Null hypothesis:
𝐻0 : 𝜇1 = 𝜇2
or
𝐻0 : 𝜇1 − 𝜇2 = 0
HYPOTHESIS TEST ON
DIFFERENCE BETWEEN MEANS

Alternative hypothesis:
𝐻𝑎 : 𝜇1 < 𝜇2
or
𝐻𝑎 : 𝜇1 − 𝜇2 < 0
𝐻0 : 𝜇1 − 𝜇2 ≥ 0
HYPOTHESIS TEST ON
DIFFERENCE BETWEEN MEANS

Recall:
𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 − 𝑃𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟
𝑇𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 =
𝑆𝐸 𝑀𝑒𝑎𝑛
𝑡𝑑𝑓 =
𝑥1 − 𝑥2 − (𝜇1 − 𝜇2 )
𝑠12 𝑠22
+
𝑛1 𝑛2
HYPOTHESIS TEST ON
DIFFERENCE BETWEEN MEANS
𝑡𝑑𝑓
18.96 − 14.14 − (0 − 0)
=
1.039
𝑡𝑑𝑓=21 = 4.64
Conclusion:
Because our test statistic is more extreme than our
critical value, we have sufficient evidence to reject
the null hypothesis and conclude there is a higher
negative evaluation scores for girls with bulimia
than those with normal eating habits.
MINITAB OUTPUT FOR SAME TEST
Difference = mu (1) - mu (2)
Estimate for difference: -4.82
95% CI for difference: (-6.98, -2.66)
T-Test of difference = 0 (vs not =): T-Value = -4.64
P-Value = 0.000 DF = 21
TIME TO PRACTICE ONE

Page 379

Problem 11.19
Sample N Mean StDev SE Mean
1
40 165.0 21.5
3.4
2
32 172.9 31.1
5.5
Difference = mu (1) - mu (2)
Estimate for difference: -7.90
90% upper bound for difference: 0.49
T-Test of difference = 0 (vs <): T-Value = -1.22
P-Value = 0.114 DF = 53
DIFFERENCE BETWEEN 2 MEANS:
POOLED VARIANCES
 If
we can assume the variances of the two
independent populations are equal, we
can pool the variances when calculating
the standard error.
 When justified, this results in a better
confidence interval.
 Therefore, we pool variances whenever
justified.
POOL VARIANCES IN
EATING DISORDER PROBLEM?

Use hypothesis test:
𝑠12
𝐻0 : 2 = 1
𝑠2
𝑠12
𝐻𝑎 : 2 ≠ 1
𝑠2
F DISTRIBUTION

Family of distributions, like normal and t
Continuous
 Shape is determined by two different degrees of
freedom

Used to compare variation among processes
 Hypothesis test

Formulate null and alternative
 Select significance level


α=.05
F DISTRIBUTION

Calculate the test statistic


𝐹=
𝑠12
𝑠22
𝑠22
𝑜𝑟 2
𝑠1
whichever is larger
Identify the critical value

𝐹(𝛼
2,𝑣1 ,𝑣2 )
α=specified level of significance
v1= df (n-1) of the sample with the larger
variance
v2= df (n-1) of the sample with the smaller
variance
F TEST: MINITAB RESULTS


Test for Equal Variances
95% Bonferroni confidence intervals for standard
deviations
Sample N Lower StDev Upper

1
12 1.29834 1.92 3.54914

2
14 2.28353 3.29 5.71750

F-Test (Normal Distribution)
 Test statistic = 0.34, p-value = 0.082

F TEST: MINITAB RESULTS
DIFFERENCE BETWEEN 2 MEANS:
POOLED VARIANCES

Standard Error:

Before
𝑠12 𝑠22
+
𝑛1 𝑛2

After
𝑠𝑝
1
1
+
𝑛1 𝑛2
where
𝑠𝑝 =
𝑛1 − 1 𝑠12 + 𝑛2 − 1 𝑠22
𝑛1 + 𝑛2 − 2
DIFFERENCE BETWEEN 2 MEANS:
POOLED VARIANCES

Confidence interval:
𝑥1 − 𝑥2 ± 𝑡∝/2 𝑠𝑝

+
1
𝑛2
Test statistic:
𝑡𝑑𝑓 =
𝑥1 − 𝑥2 − (𝜇1 − 𝜇2 )
𝑠𝑝

1
𝑛1
1
1
+
𝑛1 𝑛2
Degrees of freedom = n1+n2 – 2
95% CONFIDENCE INTERVAL:
POOLED VARIANCES

Negative evaluation scores (self-image). 2
independent groups of women.
Normal eating habits
 n = 14
x-bar = 14.14
Bulimic
 n = 12
x-bar = 18.96

s = 3.29
s = 1.92
Estimate the difference in means with a 95%
confidence level
95% CONFIDENCE INTERVAL:
POOLED VARIANCES
𝑥1 − 𝑥2 ± 𝑡∝/2 𝑠𝑝
𝑠𝑝 =
1
𝑛1
+
1
𝑛2
14 − 1 3.292 + 12 − 1 1.922
= 2.75
14 + 12 − 2
18.96 − 14.14 ± 2.064 (2.75)
1
14
1
+
12
= 2.59 to 7.05
HYPOTHESIS TEST:
POOLED VARIANCE

Test the alternative hypothesis that the mean
scores for bulimics is greater than the mean score
for normal eaters.
𝐻𝑎 : 𝜇1 − 𝜇2 < 0
𝐻0 : 𝜇1 − 𝜇2 ≥ 0

Significance level (α) = .05
HYPOTHESIS TEST:
POOLED VARIANCE

Test statistic
𝑡𝑑𝑓 =
𝑥1 − 𝑥2 − (𝜇1 − 𝜇2 )
𝑠𝑝
𝑡𝑑𝑓=24 =
1
1
+
𝑛1 𝑛2
18.96− 14.14 −(0)
2.75
1 1
+
14 12
= 4.46
2-TAIL HYPOTHESIS TEST:
MINITAB RESULTS









Two-Sample T-Test and CI
Sample N Mean StDev SE Mean
1
12 18.96 1.92 0.55
2
14 14.14 3.29 0.88
Difference = mu (1) - mu (2)
Estimate for difference: 4.82
95% CI for difference: (2.59, 7.05)
T-Test of difference = 0 (vs not =): T-Value = 4.46
P-Value = 0.000 DF = 24
Both use Pooled StDev = 2.7482
Download