12- Chi-Square and Nonparametric Tests

advertisement

Statistics for Managers

Using Microsoft® Excel

5th Edition

Chapter 12

Chi Square Tests and Nonparametric

Tests

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-1

Learning Objectives

After completing this chapter, you should be able to:

Understand the

2 test for the difference between two proportions

Use a

2 test for differences in more than two proportions

Perform a

2 test of independence

Apply and interpret the Wilcoxon rank sum test for the difference between two medians

Perform and interpret the nonparametric Kruskal-Wallis rank test for the difference between three or more medians

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-2

2

Test for the Difference Between

Two Proportions

H

0

: π

1

= π

2

H

1

: π

1

 π

2

The Chi-square test statistic is:

 2   all cells

( f o

 f e

) 2 f e

The

2 test produces the same results as the Z test for the difference in two proportions.

The

2 test can not be used for upper and lower tail tests only two-tail tests.

Therefore use the Z test for the difference in two proportions as it is a more complete test.

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-3

2

Test for The Difference Between

More Than Two Proportions

Compares observed frequencies to expected frequencies if null hypothesis is true

The closer the observed frequencies are to the expected frequencies, the more likely the H

0 is true

Tests for equality (=) of proportions only

One variable with several groups or levels

Uses a contingency table

Assumptions:

 Independent random samples

“Large” sample size

All expected frequencies

5

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-4

2

Test for The Differences Among

More Than Two Proportions

 Extend the

2 test to the case with more than two independent populations:

H

0

: π

1

= π

2

= … = π c

H

1

: Not all of the π j are equal (j = 1, 2, …, c)

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-5

The Chi-Square Test Statistic

The Chi-square test statistic is:

2   all cells

( f o

 f e f e

)

2

 where:

 f o

= observed frequency in a particular cell of the 2 x c table f e

= expected frequency in a particular cell if H

0 is true

2 for the 2 x c case has (2-1)(c-1) = c - 1 degrees of freedom

Assumed: each cell in the contingency table has expected frequency of at least 5

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-6

2 Test with More Than Two

Proportions: Example

 The sharing of patient records is a controversial issue in health care. A survey of 500 respondents asked whether they objected to their records being shared by insurance companies, by pharmacies, and by medical researchers. The results are summarized on the following table:

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-7

Object to

Record

Sharing

Yes

2 Test with More Than Two

Proportions: Example

Insurance

Companies

Organization

Pharmacies Medical

Researchers

410 295 335

1040

No 90 205 165

460

500 500 500

1500

Chap 12-8 Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

More Than Two Proportions:

Example: Expected Frequencies

Organization

Object to

Record

Sharing

Yes

Insurance

Companies

Pharmacies Medical

Researchers

295 335 1040

No

410

346.667

90

153.333

500

205

500

165

500

460

1500

Row Total /Grand Total x Column Total > 1040/1500x500 = 346.667

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-9

2 Test with More Than Two

Proportions: Example

Object to

Record

Sharing

Yes

No

Organization

Insurance

Companies f o

= 410 f e

= 346.667

f o

= 90 f e

= 153.333

Pharmacies f o

= 295 f e

= 346.667

f o

= 205 f e

= 153.333

Medical

Researchers f o

= 335 f e

= 346.667

f o

= 165 f e

= 153.333

All Expected Frequencies are > 5

The sample size is ‘Large’

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-10

2

Test for c Proportions:

Example

(continued)

Compute Test Statistic: f

0 f e

410 346.667

295 346.667

(f

0

- f e

) 2 / f e

11.571

7.700

335 346.667

90 153.333

.393

26.159

205 153.333

17.409

165 153.333

.888

Test Statistic

2 = 64.120

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-11

2 Test with More Than Two

Proportions: Example

H

0

: π

1

= π

2

= π

3

H

1

: Not all of the π j are equal (j = 1, 2, 3)

2 (64.120) = 1.19E-14 is from the chi-square distribution with 2 degrees of freedom.

p -value = 1.19E-14

Conclusion: Since 1.19E-14 < .05, you reject H

0 and you conclude that at least one proportion of respondents who object to their records being shared is different from the other organizations

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-12

The Marascuilo Procedure

The Marascuilo procedure enables you to make comparisons between all pairs of groups (similar to Tukey-Kramer).

First, compute the observed differences p j among all c(c-1)/2 pairs.

- p j’

Second, compute the corresponding critical range for the Marascuilo procedure.

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-13

The Marascuilo Procedure

 Critical Range for the Marascuilo Procedure:

Range j j

( ( 1 j j j j

) ) j j

/ /

( ( 1 j j

/ / j j

/ /

) )

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-14

The Marascuilo Procedure

Example

Organization

Object to

Record

Sharing

Insurance

Companies

Pharmacies Medical

Researchers

Yes 410 295 335

Proportion 410/500

P

1

= 0.82

295/500

P

2

= 0.59

335/500

P

3

= 0.67

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-15

The Marascuilo Procedure

Example

MARASCUILO TABLE

Proportions

| p Insurance – p Pharmacies |

| p Insurance – p Research |

| p Pharmacies – p Research |

Differences

.82 - .59 = .23

.82 - .67 = .15

.59 - .67 = -.08

Absolute

Differences

0.23

0.15

0.08

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-16

The Marascuilo Procedure

Critical Range

 

U

2 p j

( 1

 p j

)

 n j p j

/

( 1

 p j

/

) n j

/

Critical Range

5 .

991

.

82 ( 1

.

82 )

500

.

59 ( 1

.

59 )

500

.

0683

Critical Range

5 .

991

.

82 ( 1

.

82 )

500

.

67 ( 1

.

67 )

500

.

0665

Critical Range

5 .

991

.

59 ( 1

.

59 )

500

.

67 ( 1

.

67 )

500

.

0745

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-17

The Marascuilo Procedure

Example

Proportions

MARASCUILO TABLE

Absolute

Differences Critical Range

| p Insurance – p Pharmacies |

| p Insurance – p Research |

| p Research – p Insurance |

0.23

0.15

0.08

0.0683

0.0665

0.0745

Conclusion: Since each absolute difference is greater than the critical range, you conclude that each proportion is significantly different that the other two.

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-18

2

Test for more than two

Proportions in PhStat

 PhStat | Multiple-Sample Tests | Chi-Square

Test …

 Be sure to check the Marascuilo Procedure box to calculate which proportions are different (similar to the Tukey-Kramer Test)

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-19

2 Test of Independence

 Similar to the

2 test for equality of more than two proportions, but extends the concept to contingency tables with r rows and c columns

H

0

: The two categorical variables are independent

(i.e., there is no relationship between them)

H

1

: The two categorical variables are dependent

(i.e., there is a relationship between them)

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-20

2 Test of Independence

The Chi-square test statistic is:

 2   all cells

( f o

 f e

)

2 f e where: f o

= observed frequency in a particular cell of the r x c table f e

= expected frequency in a particular cell if H

0 is true

2 for the r x c case has (r-1)(c-1) degrees of freedom

Assumed: each cell in the contingency table has expected frequency of at least 1)

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-21

Example: Test of Independence

 The meal plan selected by 200 students is shown below:

Class

Standing

Fresh.

Soph.

Junior

Senior

Total

Number of meals per week

20/week 10/week none

24

22

10

14

70

32

26

14

16

88

14

12

6

10

42

Total

70

60

30

40

200

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-22

Example: Test of Independence

 The hypothesis to be tested is:

H

0

: Meal plan and class standing are independent

(i.e., there is no relationship between them)

H

1

: Meal plan and class standing are dependent

(i.e., there is a relationship between them)

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-23

Example:

Expected Cell Frequencies

(continued)

Class

Standing

Fresh.

Soph.

Junior

Senior

Total

Observed:

Number of meals per week

20/wk 10/wk none

24

22

10

14

70

32

26

14

16

88

14

12

6

10

42

Total

70

60

30

40

200

Class

Standing

Fresh.

Example for one cell:

Soph.

f e

 row total

 column total n

Junior

Senior

30

200

70

10 .

5

Total

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Expected cell frequencies if H

0 is true:

Number of meals per week

20/wk 10/wk none

24.5

30.8

14.7

21.0

26.4

12.6

10.5

13.2

6.3

14.0

17.6

70 88

8.4

42

Chap 12-24

Total

70

60

30

40

200

Example: The Test Statistic

 The test statistic value is:

2 

 all cells

( f o

 f e

( 24

24 .

5 )

2 f e

)

2

24 .

5

( 32

30 .

8 )

2

30 .

8

  

( 10

8 .

4 )

2

8 .

4

0 .

7093 p -value=.9943

(continued)

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-25

Example:

Decision and Interpretation

The test statistic is

2 

0 .

7093 , p value

.9943

(continued)

Decision Rule:

If the p-value is <

, reject H

0

, otherwise, do not reject H

0

Here, p =value not <

 so do not reject H

0

Conclusion: There is not sufficient evidence that meal plan and class standing are related at

= .05

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-26

2

Test of Independence in PHStat

 PHStat | Multiple-Sample Tests | Chi-Square

Test …

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-27

McNemar Test

Used to test for the difference between two proportions of related samples (not independent)

You need to use a test statistic that follows the normal distribution

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-28

Condition 1

McNemar Test

Condition 2

Yes No

Yes

No

Totals

A

C

A+C

Totals

B

D

A+B

C+D

B+D A+B+C+D

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-29

McNemar Test Paired Data

Test Statistic

To test the hypothesis:

H

0

H

1

: π

: π

1

1

= π

2

H

0

: π

1

≠ π

2

H

1

: π

1

 π

2

H

0

: π

1

> π

2

H

1

: π

1

 π

2

< π

2

Use the test statistic:

Z

B

C

B

C

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-30

McNemar Test

Example

 Suppose you survey 300 homeowners and ask them if they are interested in refinancing their home. In an effort to generate business, a mortgage company improved their loan terms and reduced closing costs. The same homeowners were again surveyed. Determine if change in loan terms was effective in generating business for the mortgage company. The data are summarized as follows:

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-31

McNemar Test

Example

Survey response after change

Survey response before change

Yes No Totals

Yes 118 2 120

No 22 158 180

Totals 140 160 300

Test the hypothesis (at the 0.05 level of significance):

H

0

: π

1

≥ π

2

: The change in loan terms was ineffective

H

1

: π

1

< π

2

: The change in loan terms increased business

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-32

McNemar Test

Example

Survey response before change

Yes

No

Totals

Survey response after change

Yes No Totals

118

22

140

2

158

160

120

180

300

The test statistic is:

Z

B

C

B

C

2

22

2

22

 

4 .

08 p -value = 2.23E-05

Since 2.23E-05 < .05, you reject H

0 and conclude that the change in loan terms significantly increase business for the mortgage company.

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-33

Wilcoxon Rank-Sum Test for

Differences in Two

Medians

Test two independent population medians

Populations need not be normally distributed

Distribution free procedure

Used when only rank data are available

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-34

Wilcoxon Rank-Sum Test

Using Normal Approximation

 For large samples, the test statistic T

1 with mean and standard deviation: is approximately normal

μ

T

1

 n

1

( n

1 )

2

σ

T

1

 n

1 n

2

( n

12

1 )

Must use the normal approximation if either n

1

Assign n

1 or n to be the smaller of the two sample sizes

2

> 10

Can also use the normal approximation for small samples n

1 and n

2

<10

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-35

Wilcoxon Rank-Sum Test

Sample data are collected on the capacity rates (% of capacity) for two factories.

Are the median operating rates for two factories the same?

For factory A, the rates are 71, 82, 77, 94, 88

For factory B, the rates are 85, 82, 92, 97

Test for equality of the population medians at the 0.05 significance level

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-36

Wilcoxon Rank-Sum Test

Example with Small Samples

Ranked

Capacity values:

Tie in 3 rd and

4 th places

Capacity Rank

Factory A Factory B Factory A Factory B

71

77

82 82

85

1

2

3.5

3.5

5

88 6

92 7

94 8

97 9

Rank Sums:

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

20.5

24.5

Chap 12-37

Wilcoxon Rank-Sum Test

Example with Small Samples

Factory B has the smaller sample size, so the test statistic is the sum of the Factory B ranks:

T = 24.5

The sample sizes are: n

1

= 4 (factory B) n

2

= 5 (factory A)

The level of significance is α = .05

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-38

Wilcoxon Rank-Sum Test

Using Normal Approximation

 The Z test statistic is

Z

T

1

μ

T

1

σ

T

1

 Where Z approximately follows a standardized normal distribution

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-39

Wilcoxon Rank-Sum Test

Using Normal Approximation

Use the setting of the prior example:

The sample sizes were: n

1

= 4 (factory B) n

2

= 5 (factory A)

The level of significance was α = .05

The test statistic was T

1

= 24.5

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-40

Wilcoxon Rank-Sum Test

Using Normal Approximation

μ

T

1

 n

1

( n

1 )

2

4 ( 9

1 )

2

20

σ

T

1

 n

1 n

2

( n

1 )

12

4 ( 5 ) ( 9

1 )

4 .

082

12

The test statistic is

Z

T

1

 μ

T

1

σ

T

1

24 .

5

20

1 .

10

4.082

 p -value =.2703 is not less than α = .05 so you do not reject H

0

– there is not sufficient evidence that the medians are different

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-41

Wilcoxon Rank-Sum Test for

Differences in 2 Medians Using the Normal Approximation

 PHStat | Two-Sample Tests | Wilcoxon Rank

Sum Test …

 Use for both small samples <10 and large samples >10

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-42

Kruskal-Wallis Rank Test for more than Two Population Medians

Tests the equality of more than 2 population medians

Use when the normality assumption for one-way

ANOVA is violated

Assumptions:

 The samples are random and independent

 variables have a continuous distribution

 the data can be ranked

 populations have the same variability

 populations have the same shape

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-43

Kruskal-Wallis Rank Test

 The Kruskal-Wallis H test statistic:

(with c – 1 degrees of freedom)

H

12 n ( n

1 ) j c 

1

T j

2 n j

3 ( n

1 ) where: n = total number of values over the combined samples c = Number of groups

T j n j

= Sum of ranks in the j

= Size of the j th sample th sample

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-44

Kruskal-Wallis Rank Test

Example

 Do different branch offices have a different number of employees?

Office size

(Chicago, C)

23

41

54

78

66

Office size

(Denver, D)

55

60

72

45

70

Office size

(Houston, H)

30

40

18

34

44

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-45

Kruskal-Wallis Rank Test

Example

 Do different branch offices have a different number of employees?

Office size

(Chicago, C)

Ranking

23

41

54

78

66

2

6

9

15

12

= 44

Office size

(Denver, D)

55

60

72

45

70

Ranking

Office size

(Houston, H)

10

11

14

8

13

= 56

30

40

18

34

44

Ranking

3

5

1

4

7

= 20

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-46

Kruskal-Wallis Rank Test

Example

H

0

: Median

C

Median

D

Median

H

H

A

: Not all population Medians are equal

 The H statistic is

H

12 n ( n

1 ) j c 

1

T j

2 n j

3 ( n

1 )

 15

12

( 15

1 )



44

2

5

56

2

5

20

2

5



3 ( 15

1 )

6 .

72

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-47

Kruskal-Wallis Test

Example Solution

H

0

: Med

1

= Med

2

= Med

3

H

1

: Not all medians equal

= .05

Test Statistic:

H = 6.72

p -value = .0347

Decision:

Reject at

= .05

Conclusion:

There is sufficient evidence that the population median office sizes are not all equal by city.

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-48

Kruskal-Wallis Rank Test in PHStat

 PHStat | Multiple-Sample Tests | Kruskal-

Wallis Rank Test …

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-49

Chapter Summary

In this chapter, we have

Examined the

2 test for the difference between two proportions

Developed and applied the

2 test for differences in more than two proportions

Examined the

2 test for independence

Used the McNemar test for differences in two related proportions

Used the Wilcoxon rank sum test for two population medians

 Small Samples

 Large sample Z approximation

Applied the Kruskal-Wallis H-test for multiple population medians

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 12-50

Download