Categorical Data

advertisement

Categorical Data

 In Chapter 3 we introduced the idea of categorical data.

 In Chapter 15 we explored probability rules and when events are independent.

 In Chapter 26 we put these two ideas together to compare counts.

1

Categorical Data

 National Opinion Research

Center’s General Social Survey

 In 2006 a sample of 1928 adults in the U.S. were asked the question

“When is premarital sex wrong?”

The participants were also asked with what religion they were affiliated.

2

Who?/What?

 Who?

–A sample of 1928 adults.

 What?

–Attitude towards premarital sex.

–Religious affiliation.

3

What?

 When is premarital sex wrong?

–Categorical: Always Wrong,

Almost Always Wrong,

Sometimes Wrong, Not Wrong at All

4

What?

 What is your religious affiliation?

–Categorical: Catholic, Jewish,

Protestant, None, Other

5

When is Premarital Sex Wrong?

Religion

Catholic

Jewish

Protestant

None

Other

Total

Always

Wrong

83

Almost

Always

Wrong

47

Sometimes

Wrong

105

Not

Wrong at All

249

4 2 9 20

364

27

28

506

97

12

8

166

190

52

20

376

Total

341

219

51 107

880 1928

484

35

992

310

6

When is Premarital Sex Wrong?

Religion

Catholic

Jewish

Protestant

None

Other

Always Almost

Always

Sometimes Never Total

17.2%

11.4%

9.7%

5.7%

21.7%

25.7%

51.4%

57.2%

100%

100%

36.7%

8.7%

26.2%

9.8%

3.9%

7.5%

19.1% 34.4% 100%

16.8% 70.6% 100%

18.7% 47.6% 100%

7

Mosaic Plot

8

Comparing Counts

 People who have no religion or are Jewish are more likely to say premarital sex is Not Wrong at All.

 Protestants are much more likely to say premarital sex is

Always Wrong.

9

Comparing Counts

 Are these differences statistically significant?

 Or, are these differences due to chance variation so that religion and attitude towards premarital sex are independent?

10

Comparing Counts

 If religion and attitude towards premarital sex are independent then

Pr(A and B) = Pr(A)*Pr(B) where A is a religion category and B is an attitude category.

11

Expected Count

 If religion and attitude toward premarital sex are independent we would expect to see n*Pr(A)*Pr(B) people in the religion category

A and the attitude category B.

12

Religion

Catholic

Jewish

Protestant

None

Other

Total

Always Almost

Always

Sometimes Never Total

506 166 376

484

35

992

310

107

880 1928

13

Expected Count

 Catholic and Always Wrong

E

1928

484

*

1928

506

*

1928

E

484 * 506

127 .

0

1928

14

Religion

Catholic

Expected Counts

Always Almost

Always

Sometimes Never Total

127.0

41.7

94.4

220.9

484

Jewish

Protestant

None

Other

Total

9.2

260.3

81.4

28.1

506

3.0

85.4

26.7

9.2

166

6.8

16.0

193.5

452.8

60.4

141.5

20.9

376

48.8

107

880 1928

35

992

310

15

Observed = Expected?

 Take the difference between the observed and expected counts in a cell.

 Square the difference.

 Divide by the expected count.

 Sum up over all the cells.

16

Chi-square Test Statistic

 2 df

 

O

E

2

E

 r

1

  c

1

17

Cell contributions to

 2

 Catholic and Always

83

127 .

0

 

44 .

0

2

15 .

24

127 .

0 127 .

0

18

Test of Independence

 H

0

: Religion and attitude towards premarital sex are independent.

 H

A

: Religion and attitude towards premarital sex are not independent.

 2

= 184.51, df=(5-1)*(4-1)=12

 P-value < 0.0001

19

Test of Independence

 Because the P-value is so small we reject the null hypothesis.

 Religion and Attitude towards premarital sex are not independent.

20

Comment

 Look at the cells with the largest contributions to the test statistic.

 None and Always Wrong has much fewer people than expected and None and Not

Wrong at All has much more people than expected.

21

Comment

 Look at the cells with the largest contributions to the test statistic.

 Protestant and Always Wrong has much more people than expected and Protestant and

Not Wrong at All has much fewer people than expected.

22

Summary

 Protestants are much more likely to say Always Wrong and much less likely to say Not Wrong at All.

 People with no religion are much more likely to say Not Wrong at All and much less likely to say Always

Wrong .

23

JMP

Religion Attitude

1 Catholic 1 Always Wrong

2 Jewish 1 Always Wrong

3 Protestant 1 Always Wrong

4 None

5 Other

1 Always Wrong

1 Always Wrong

5 Other 4 Not Wrong at All

Count

83

4

364

27

28

51

24

JMP

 Fit Y by X

 Y, Response: Attitude

 X, Factor: Religion

 Freq: Count

25

Test

Likelihood Ratio

Pearson

ChiSquare Prob>ChiSq

193.959

<.0001*

184.510

<.0001*

26

Comment

 Remember that JMP only does the calculations for you (Step

3). You have to provide all the other steps in the test of hypothesis.

27

Download