Comparing 2 Samples

advertisement

252meanx1 10/6/05 (Open this document in 'Outline' view!) Re-edited to replace

with D .

D. COMPARISON OF TWO SAMPLES

Examples for comparison of means.

H

0

:

H

1

:

1

 

1

 

2

2

or more generally

H

0

H

1

:

: D

D

D

0

D

0

.

where D

 

1

 

2 and d

 x

1

 x

2

.

General formulas: Degrees of freedom DF

 n

1

 n

2 a. Confidence Interval: D

 d

2

 t

2 s d

. b. Test Ratio: t

 d

D

0 s d

. c. Critical Value: d cv

D

0

 t

2 s d

.

The difference between the cases comes down to the choice of t and the formula for s d

. Let us now consider the first four cases.

1. Two Means, Two Independent Samples, Large Samples.

If the total number of degrees of freedom is large (or the two samples come from normally distributed populations with known variances

 2

1 and

 2

2

), then replace t with z and use

First Example : We wish to test the earnings of retail clerks in New York s d

 s

1

2

 s

2

2

. n

1 n

2

 

and Philadelphia

 

for

Equality.

H

0

H

1

:

:

1

1

2

2

or

H

0

H

1

:

:

1

1

2

2

0

0

H

0

or

H

1

: D

: D

0

0

 

.

05

Data: x

1

300 x

2

330 s

1

2 n

1

400

169 s

2

2 n

2

360

144

Since DF

 n

1

 n in using a large sample method. d

 x

1

 x

2

2

2

169

144

2

300

330

 

30

311 are well over 100, we are justified s d

 s

1 n

1

2

 s

2

2 n

2

400

169

360

144

2 .

3669

2 .

5000

4

Solutions: Use z

2

 z

.

025

1 .

960 in place of t . a. Confidence Interval: D

 d

 t

2 s d

.

8669

-30

2 .

2061

1 .

960

2 .

2061

 

30

4 .

32 . Make a diagram showing a Normal curve centered at -30 and a Confidence interval bounded by -

34.32 and -25.68. Since D

0

0 is not between them, reject H

0

.

b. Test Ratio: t

 d

 s d

D

0 

30 0

2.2061

 

13 .

59 . Make a diagram showing a Normal curve centered at zero and an 'accept' region bounded by

 z

.

025

 

1 .

960 and z

.

025

1 .

960 .

Since -13.59 is not between them, reject value would be easy. c. Critical Value: d cv pval

D

0

2 P t

2 d s d

30

0

H

0

. Since this is actually a value of z , a p-

2 P

1.960

 z

 

13 .

59

2.2061

.

5

4 .

32 .

.

5

0 .

Make a diagram showing a Normal curve centered at zero and an 'accept' region bounded by -4.32 and

4.32. Since d

 

30 is not between them, reject H

0

.

Second Example : (Whitmore, Netter, Wasserman) We wish to learn if battery type B

 

has a longer service life (in months) than Battery type A

 

. Note that this

H

H

1

0

:

:

1

1

 statement becomes an alternate hypothesis because it does not contain an equality.

2

2

or

H

H

0

1

:

:

1

1

2

2

0

0

or

H

H

0

1

:

: D

D

0

0

 

.

05

Data: x

1

18 .

4 s

1

3 .

3 n

1

121 x s

2 n

2

2

21

4 .

2

36

.

3

Since DF

 n

1

 n

2

2

121

36

2

155 are well over 100, we are justified in using a large sample method. d

 x

1

 x

2

18 .

4

21 .

3

 

2 .

9 s d

 s

1

2

 s

2

2 n

2

   

2

0 .

09

0 .

49

0 .

58

0 .

7616 n

1

121 36

Solutions: Use z

 z

.

05

1 .

645 in place of t . This is a 1-sided test. a. Confidence Interval: Given H

1

: D

0 , use D

 d

 t

 s d

-2.9

1.645

0 .

7616

 

2 .

9

1 .

hypothesis

253

H

0

:

D

1 confidence interval,

.

647

D

0

. Make a diagram showing a Normal curve centered at -2.9 and a

1 .

647 , represented by shading the area below -1.647. the null is represented shading the area above zero. Since D

0

0 is not in the confidence interval, reject H

0

. b. Test Ratio: t

 d

 s d

D

0 

2.9

0

 

3 .

808 . Make a diagram showing a Normal curve

0.7616

centered at zero and an 'accept' region above

 z

.

05

 

1 .

645 . Since -3.808 is below -

1,645, reject H

0

. c. Critical Value: Given d cv

D

0

 t  s d

0 -

H

1.645

1

: D

0.7616

0

, we want a critical value below zero. Use

 

1 .

253 .

Make a diagram showing a Normal curve centered at zero and a reject' region below -1.253. Since d

 

2 .

9 is below -1.253, reject H

0

.

2. Two Means, Two Independent Samples, Populations Normally

Distributed, Population Variances Assumed Equal. t

 t

 n

1

 n

2

2

and s d

 s

2 p



1 n

1

1 n

2

 , where s

2 p

 n

1

1

 s

1

2 

 n

2

1

 s n

1

 n

2

2

2

2

.

Example : (Whitmore, Netter, Wasserman) Each of two groups of ten men are assigned a razor blade and asked how many shaves they got from a package. We wish to find out if there is a significant difference in the durability of the two blades. Type A will be

H

0

H

1

:

:

1

1

2

2

or

H

0

H

1

:

:

1

1

2

2

0

0

or x

1

. type B will be

H

0

H

1

:

: D

D

0

0

  x

2

.

01

.

Data: x

1

46 .

9 x

2

75 .

6 s

1 n

1

14 .

0 s

2

10 n

2

21 .

0 Since DF

10

 n

1

 n

2

2

10

10

2

18 are well below 100, we need a small sample method. Because of the similarity of the two samples we assume that

 2

1

  2

2

. d s

2 p

 x

1

 x

2

 n

1

1

 n

1

 s

2

1

46 .

9

 n

2

 n

2

2

75 .

6

1

 s

2

2

28 .

7

10

. Because of our assumption about variances, we use a pooled variance

1

  

10

10

10

2

1



21 .

0

2

318 .

50 s d

 s

2 p



1 n

1

1 n

2



318 .

50

1

10

Solutions: Use t

1

10

318 .

50

 n

1

 n

2

2

 t

 

.

005

2

2 .

878 .

63 .

7

7 .

9812 .

a. Confidence Interval: D

 d

 t

2 s d

-28.7

2 .

878

7 .

9812

 

28 .

7

22 .

96 . Make a diagram showing a Normal curve centered at -28.7 and a Confidence interval bounded by

-51.66 and -5.74. Since D

0

0 is not between them, reject H

0

. b. Test Ratio: t

 d

 s d

D

0 

28.7

0

7.9812

 

3 .

596 . Make a diagram showing a Normal curve centered at zero and an 'accept' region bounded by

 t

.

005

 

2 .

878 and t

.

005 need

2 .

878 pval

. Since -3.596 is not between them, reject

2 P

 d

 

28 .

70

2 P

 t

 

3 .

596

.

H

0

. If we want a p-value, we

Since 3.596 lies between t

.

005

2 .

878 and t

.

001

3 .

611 , we double the p-value to .

002 c. Critical Value: d cv

D

0

 t

2 s d

0

2.878

 pval

7.9812

.

01 .

 

22 .

96 .

Make a diagram showing a Normal curve centered at zero and an 'accept' region bounded by -22.96 and

22.96. Since d

 

28 .

7 is not between them, reject H

0

.

(3. Two Means, Two independent Samples, Populations Normally

Distributed, Population Variances not Assumed Equal.

This time the degrees of freedom for t must be calculated by the Satterthwaite approximation. The formula is df

 s

1

2 n

1

 s

2

2 n

2 s

1

2 n

1 n

1

1

2

2 s

2

2 n

2 n

2

1

2

, but the formula for the standard deviation is the same as in method 1, s d

 s

1

2 n

1

 s

2

2 n

2

.

Example : We wish to use a 2-sided 95% confidence interval to test for a significant difference between the time it takes an employee to type a page on word processor A

 

and word processor B

 

. 16 pages are typed on each processor.

H

H

0

1

:

:

1

1

2

2

or

H

H

1

0

:

:

1

1

2

2

0

0

H

or

H

1

0

: D

: D

0

0

 

.

05

Data: x

1

8 .

20 s n

1

2

1

4 .

10

16 x s

2

2 n

2

2

7 .

10

4 .

20

16

Since DF

 n

1

 n

2

2

16

16

2

30 are well below 100, we would be on very, very shaky ground if we use a large sample method. Even with a small sample method we probably need an assumption of Normality. If we do not want to assume

 2

1

  2

2

, we need the Satterthwaite method. d

 x

1

 x

2

8 .

20

7 .

10

1 .

10 . To find the standard error of d and the number of degrees of freedom, do the following calculations: s

1

2

  n

1

4 .

1

0 .

25625

16 s

2

2

, n

2

4 .

2

0 .

26250

16

, so s

1

2 n

1

 s

2

2 n

2

0 .

25625

0 .

26250

0 .

51875 ,

DF

 s

1

2 s

1

2 n

1

2

 s

2 n

2

2

2

0 .

25625

15

0 .

51875

2

0 .

26250

15

2

0 .

0 .

26910

00438

0 .

00459

29 .

9 . I take the conservative n

1 n

1

1

 n s n

2

2

2

2

2

1 approach of rounding this down to 29 degrees of freedom. Notice how little difference there is between this and DF

 n

1

 n

2

2

16

16

2

30 . This is because of the near-equality of the sample variances. s d

 s

1

2

 s

2

2 n

2

0 .

51875

0 .

720 . This is almost the same result as we would have gotten if we had n

1 assumed that

1

2   2

2

, again because of the near-equality of the sample variances.

Solutions: Use t

.

025

2 .

045 for a 2-sided confidence interval. D

 d

 t

2 s d

1.10

2 .

045

0 .

720

1 .

10

1 .

47 . Make a diagram showing a Normal curve centered at 1.10 and a Confidence interval bounded by -0.37 and 2.57. Since D

0

0 is between them, do not reject H

0

.

A more complete version of this problem appears in Problem D3.

Look at document 252meanx3 here for a computer example of this method.

4. Two Means, Paired Samples (If samples are small, populations should be normally distributed).

If n is the number of pairs of data, then t

 t n

and s d

1 n

 d

2 n

 n

1 d

2

. In this case d

1

 x

11

 x

21

, d

2

 x

21

 x

22

, etc.

Example : We have been told that income in a region has risen by $6.00/week over the last year. We interviewed 100 families last year and found an average weekly income

  of $200. We reinterview the same families and find out their present incomes so that we can compute how much they have risen. We find that the new average income

 

is $204. From the data we compute a standard deviation of the income change of $6.00. We wish to test to see if the $6 rise is believable.

H

H

0

1

:

:

2

2

1

1

6

6

or

H

H

0

1

:

:

1

1

2

2

6

6

or

H

H

0

1

:

: D

D

6

6

 

.

05

Data: x

1

200 , x

2

204 , s d

6 , n=100. Though we may have 200 pieces of data, we have 100 pairs, and the actual numbers we use are the 100 differences in income. DF

 n

1

100

1

99 and we ought to use t .

d

 x

1

 x

2

200

204

 

4 . s d

 s d 

6

0 .

6

1 .

984 . n 100

Solutions: Use t

 

2

 t

 

.

025 a. Confidence Interval: D

 d

 t

2 s d

-4

1 .

984

 

4

1 .

19 . Make a diagram showing a Normal curve centered at -4 and a Confidence interval bounded by -5.19 and -

2.81. Since D

0

 

6 is not between them, reject H

0

. b. Test Ratio: t

 d

 s d

D

0 

4 -

0.6

3 .

333 . Make a diagram showing a Normal curve centered at zero and an 'accept' region bounded by

 t

.

025

 

1 .

984 and t

.

025

1 .

984 .

Since 3.333 is not between them, reject pval

2 P

 d

 

4

2 P

 t

3 .

333

.

H

0

. If we want a p-value, we need

Since 3.333 lies above t

.

001

3 .

175 , we double the implied p-value to pval

.

002 .

c. Critical Value: d cv

D

0

 t

2 s d

-6

1.984

0.61

 

6

1 .

19 .

Make a diagram showing a Normal curve centered at -6 and an 'accept' region bounded by -7.12 and -

4.81. Since d

 

4 is not between them, reject H

0

.

Note: We might have been better off in this problem defining D as

  

1

. Then our hypotheses would read H

0

: D

6 and H

1

: D

6 . We would say d

 x

2 d cv

6

 x

1

204

1.984

 

200

6

4 and our critical value, for example, would be

1 .

19 .

The conclusion would not change.

© 2002 Roger Even Bove

Download