252meanx1 10/6/05 (Open this document in 'Outline' view!) Re-edited to replace
with D .
D. COMPARISON OF TWO SAMPLES
Examples for comparison of means.
H
0
:
H
1
:
1
1
2
2
or more generally
H
0
H
1
:
: D
D
D
0
D
0
.
where D
1
2 and d
x
1
x
2
.
General formulas: Degrees of freedom DF
n
1
n
2 a. Confidence Interval: D
d
2
t
2 s d
. b. Test Ratio: t
d
D
0 s d
. c. Critical Value: d cv
D
0
t
2 s d
.
The difference between the cases comes down to the choice of t and the formula for s d
. Let us now consider the first four cases.
1. Two Means, Two Independent Samples, Large Samples.
If the total number of degrees of freedom is large (or the two samples come from normally distributed populations with known variances
2
1 and
2
2
), then replace t with z and use
First Example : We wish to test the earnings of retail clerks in New York s d
s
1
2
s
2
2
. n
1 n
2
and Philadelphia
for
Equality.
H
0
H
1
:
:
1
1
2
2
or
H
0
H
1
:
:
1
1
2
2
0
0
H
0
or
H
1
: D
: D
0
0
.
05
Data: x
1
300 x
2
330 s
1
2 n
1
400
169 s
2
2 n
2
360
144
Since DF
n
1
n in using a large sample method. d
x
1
x
2
2
2
169
144
2
300
330
30
311 are well over 100, we are justified s d
s
1 n
1
2
s
2
2 n
2
400
169
360
144
2 .
3669
2 .
5000
4
Solutions: Use z
2
z
.
025
1 .
960 in place of t . a. Confidence Interval: D
d
t
2 s d
.
8669
-30
2 .
2061
1 .
960
2 .
2061
30
4 .
32 . Make a diagram showing a Normal curve centered at -30 and a Confidence interval bounded by -
34.32 and -25.68. Since D
0
0 is not between them, reject H
0
.
b. Test Ratio: t
d
s d
D
0
30 0
2.2061
13 .
59 . Make a diagram showing a Normal curve centered at zero and an 'accept' region bounded by
z
.
025
1 .
960 and z
.
025
1 .
960 .
Since -13.59 is not between them, reject value would be easy. c. Critical Value: d cv pval
D
0
2 P t
2 d s d
30
0
H
0
. Since this is actually a value of z , a p-
2 P
1.960
z
13 .
59
2.2061
.
5
4 .
32 .
.
5
0 .
Make a diagram showing a Normal curve centered at zero and an 'accept' region bounded by -4.32 and
4.32. Since d
30 is not between them, reject H
0
.
Second Example : (Whitmore, Netter, Wasserman) We wish to learn if battery type B
has a longer service life (in months) than Battery type A
. Note that this
H
H
1
0
:
:
1
1
statement becomes an alternate hypothesis because it does not contain an equality.
2
2
or
H
H
0
1
:
:
1
1
2
2
0
0
or
H
H
0
1
:
: D
D
0
0
.
05
Data: x
1
18 .
4 s
1
3 .
3 n
1
121 x s
2 n
2
2
21
4 .
2
36
.
3
Since DF
n
1
n
2
2
121
36
2
155 are well over 100, we are justified in using a large sample method. d
x
1
x
2
18 .
4
21 .
3
2 .
9 s d
s
1
2
s
2
2 n
2
2
0 .
09
0 .
49
0 .
58
0 .
7616 n
1
121 36
Solutions: Use z
z
.
05
1 .
645 in place of t . This is a 1-sided test. a. Confidence Interval: Given H
1
: D
0 , use D
d
t
s d
-2.9
1.645
0 .
7616
2 .
9
1 .
hypothesis
253
H
0
:
D
1 confidence interval,
.
647
D
0
. Make a diagram showing a Normal curve centered at -2.9 and a
1 .
647 , represented by shading the area below -1.647. the null is represented shading the area above zero. Since D
0
0 is not in the confidence interval, reject H
0
. b. Test Ratio: t
d
s d
D
0
2.9
0
3 .
808 . Make a diagram showing a Normal curve
0.7616
centered at zero and an 'accept' region above
z
.
05
1 .
645 . Since -3.808 is below -
1,645, reject H
0
. c. Critical Value: Given d cv
D
0
t s d
0 -
H
1.645
1
: D
0.7616
0
, we want a critical value below zero. Use
1 .
253 .
Make a diagram showing a Normal curve centered at zero and a reject' region below -1.253. Since d
2 .
9 is below -1.253, reject H
0
.
2. Two Means, Two Independent Samples, Populations Normally
Distributed, Population Variances Assumed Equal. t
t
n
1
n
2
2
and s d
s
2 p
1 n
1
1 n
2
, where s
2 p
n
1
1
s
1
2
n
2
1
s n
1
n
2
2
2
2
.
Example : (Whitmore, Netter, Wasserman) Each of two groups of ten men are assigned a razor blade and asked how many shaves they got from a package. We wish to find out if there is a significant difference in the durability of the two blades. Type A will be
H
0
H
1
:
:
1
1
2
2
or
H
0
H
1
:
:
1
1
2
2
0
0
or x
1
. type B will be
H
0
H
1
:
: D
D
0
0
x
2
.
01
.
Data: x
1
46 .
9 x
2
75 .
6 s
1 n
1
14 .
0 s
2
10 n
2
21 .
0 Since DF
10
n
1
n
2
2
10
10
2
18 are well below 100, we need a small sample method. Because of the similarity of the two samples we assume that
2
1
2
2
. d s
2 p
x
1
x
2
n
1
1
n
1
s
2
1
46 .
9
n
2
n
2
2
75 .
6
1
s
2
2
28 .
7
10
. Because of our assumption about variances, we use a pooled variance
1
10
10
10
2
1
21 .
0
2
318 .
50 s d
s
2 p
1 n
1
1 n
2
318 .
50
1
10
Solutions: Use t
1
10
318 .
50
n
1
n
2
2
t
.
005
2
2 .
878 .
63 .
7
7 .
9812 .
a. Confidence Interval: D
d
t
2 s d
-28.7
2 .
878
7 .
9812
28 .
7
22 .
96 . Make a diagram showing a Normal curve centered at -28.7 and a Confidence interval bounded by
-51.66 and -5.74. Since D
0
0 is not between them, reject H
0
. b. Test Ratio: t
d
s d
D
0
28.7
0
7.9812
3 .
596 . Make a diagram showing a Normal curve centered at zero and an 'accept' region bounded by
t
.
005
2 .
878 and t
.
005 need
2 .
878 pval
. Since -3.596 is not between them, reject
2 P
d
28 .
70
2 P
t
3 .
596
.
H
0
. If we want a p-value, we
Since 3.596 lies between t
.
005
2 .
878 and t
.
001
3 .
611 , we double the p-value to .
002 c. Critical Value: d cv
D
0
t
2 s d
0
2.878
pval
7.9812
.
01 .
22 .
96 .
Make a diagram showing a Normal curve centered at zero and an 'accept' region bounded by -22.96 and
22.96. Since d
28 .
7 is not between them, reject H
0
.
(3. Two Means, Two independent Samples, Populations Normally
Distributed, Population Variances not Assumed Equal.
This time the degrees of freedom for t must be calculated by the Satterthwaite approximation. The formula is df
s
1
2 n
1
s
2
2 n
2 s
1
2 n
1 n
1
1
2
2 s
2
2 n
2 n
2
1
2
, but the formula for the standard deviation is the same as in method 1, s d
s
1
2 n
1
s
2
2 n
2
.
Example : We wish to use a 2-sided 95% confidence interval to test for a significant difference between the time it takes an employee to type a page on word processor A
and word processor B
. 16 pages are typed on each processor.
H
H
0
1
:
:
1
1
2
2
or
H
H
1
0
:
:
1
1
2
2
0
0
H
or
H
1
0
: D
: D
0
0
.
05
Data: x
1
8 .
20 s n
1
2
1
4 .
10
16 x s
2
2 n
2
2
7 .
10
4 .
20
16
Since DF
n
1
n
2
2
16
16
2
30 are well below 100, we would be on very, very shaky ground if we use a large sample method. Even with a small sample method we probably need an assumption of Normality. If we do not want to assume
2
1
2
2
, we need the Satterthwaite method. d
x
1
x
2
8 .
20
7 .
10
1 .
10 . To find the standard error of d and the number of degrees of freedom, do the following calculations: s
1
2
n
1
4 .
1
0 .
25625
16 s
2
2
, n
2
4 .
2
0 .
26250
16
, so s
1
2 n
1
s
2
2 n
2
0 .
25625
0 .
26250
0 .
51875 ,
DF
s
1
2 s
1
2 n
1
2
s
2 n
2
2
2
0 .
25625
15
0 .
51875
2
0 .
26250
15
2
0 .
0 .
26910
00438
0 .
00459
29 .
9 . I take the conservative n
1 n
1
1
n s n
2
2
2
2
2
1 approach of rounding this down to 29 degrees of freedom. Notice how little difference there is between this and DF
n
1
n
2
2
16
16
2
30 . This is because of the near-equality of the sample variances. s d
s
1
2
s
2
2 n
2
0 .
51875
0 .
720 . This is almost the same result as we would have gotten if we had n
1 assumed that
1
2 2
2
, again because of the near-equality of the sample variances.
Solutions: Use t
.
025
2 .
045 for a 2-sided confidence interval. D
d
t
2 s d
1.10
2 .
045
0 .
720
1 .
10
1 .
47 . Make a diagram showing a Normal curve centered at 1.10 and a Confidence interval bounded by -0.37 and 2.57. Since D
0
0 is between them, do not reject H
0
.
A more complete version of this problem appears in Problem D3.
Look at document 252meanx3 here for a computer example of this method.
4. Two Means, Paired Samples (If samples are small, populations should be normally distributed).
If n is the number of pairs of data, then t
t n
and s d
1 n
d
2 n
n
1 d
2
. In this case d
1
x
11
x
21
, d
2
x
21
x
22
, etc.
Example : We have been told that income in a region has risen by $6.00/week over the last year. We interviewed 100 families last year and found an average weekly income
of $200. We reinterview the same families and find out their present incomes so that we can compute how much they have risen. We find that the new average income
is $204. From the data we compute a standard deviation of the income change of $6.00. We wish to test to see if the $6 rise is believable.
H
H
0
1
:
:
2
2
1
1
6
6
or
H
H
0
1
:
:
1
1
2
2
6
6
or
H
H
0
1
:
: D
D
6
6
.
05
Data: x
1
200 , x
2
204 , s d
6 , n=100. Though we may have 200 pieces of data, we have 100 pairs, and the actual numbers we use are the 100 differences in income. DF
n
1
100
1
99 and we ought to use t .
d
x
1
x
2
200
204
4 . s d
s d
6
0 .
6
1 .
984 . n 100
Solutions: Use t
2
t
.
025 a. Confidence Interval: D
d
t
2 s d
-4
1 .
984
4
1 .
19 . Make a diagram showing a Normal curve centered at -4 and a Confidence interval bounded by -5.19 and -
2.81. Since D
0
6 is not between them, reject H
0
. b. Test Ratio: t
d
s d
D
0
4 -
0.6
3 .
333 . Make a diagram showing a Normal curve centered at zero and an 'accept' region bounded by
t
.
025
1 .
984 and t
.
025
1 .
984 .
Since 3.333 is not between them, reject pval
2 P
d
4
2 P
t
3 .
333
.
H
0
. If we want a p-value, we need
Since 3.333 lies above t
.
001
3 .
175 , we double the implied p-value to pval
.
002 .
c. Critical Value: d cv
D
0
t
2 s d
-6
1.984
0.61
6
1 .
19 .
Make a diagram showing a Normal curve centered at -6 and an 'accept' region bounded by -7.12 and -
4.81. Since d
4 is not between them, reject H
0
.
Note: We might have been better off in this problem defining D as
1
. Then our hypotheses would read H
0
: D
6 and H
1
: D
6 . We would say d
x
2 d cv
6
x
1
204
1.984
200
6
4 and our critical value, for example, would be
1 .
19 .
The conclusion would not change.
© 2002 Roger Even Bove