Stat 101L – Lecture 39 Categorical Data

advertisement
Stat 101L – Lecture 39
Categorical Data
 In
Chapter 3 we introduced the
idea of categorical data.
 In Chapter 15 we explored
probability rules and when events
are independent.
 In Chapter 26 we put these two
ideas together to compare counts.
1
Categorical Data
 National
Opinion Research
Center’s General Social Survey
 In 2006 a sample of 1928 adults in
the U.S. were asked the question
“When is premarital sex wrong?”
The participants were also asked
with what religion they were
affiliated.
2
Who?/What?
 Who?
–A sample of 1928 adults.
 What?
–Attitude towards premarital sex.
–Religious affiliation.
3
Stat 101L – Lecture 39
What?
 When
is premarital sex
wrong?
–Categorical: Always Wrong,
Almost Always Wrong,
Sometimes Wrong, Not Wrong
at All
4
What?
 What
is your religious
affiliation?
–Categorical: Catholic, Jewish,
Protestant, None, Other
5
When is Premarital Sex Wrong?
Religion
Always
Wrong
Almost
Always
Wrong
Sometimes
Wrong
Not
Wrong
at All
Total
Catholic
83
47
105
249
484
Jewish
4
2
9
20
35
364
97
190
341
992
27
12
52
219
310
Protestant
None
Other
28
8
20
51
107
Total
506
166
376
880
1928
6
Stat 101L – Lecture 39
When is Premarital Sex Wrong?
Always
Religion
Catholic
17.2%
Almost
Always
9.7%
Sometimes
21.7%
Never
51.4%
Total
100%
Jewish
11.4%
5.7%
25.7%
57.2%
100%
Protestant
36.7%
9.8%
19.1%
34.4%
100%
None
8.7%
3.9%
16.8%
70.6%
100%
Other
26.2%
7.5%
18.7%
47.6%
100%
7
Mosaic Plot
8
Comparing Counts
 People
who have no religion or
are Jewish are more likely to
say premarital sex is Not Wrong
at All.
 Protestants are much more
likely to say premarital sex is
Always Wrong.
9
Stat 101L – Lecture 39
Comparing Counts
 Are
these differences
statistically significant?
 Or, are these differences due to
chance variation so that religion
and attitude towards premarital
sex are independent?
10
Comparing Counts
 If
religion and attitude towards
premarital sex are independent
then
Pr(A and B) = Pr(A)*Pr(B)
where A is a religion category
and B is an attitude category.
11
Expected Count
 If
religion and attitude toward
premarital sex are independent
we would expect to see
n*Pr(A)*Pr(B)
people in the religion category
A and the attitude category B.
12
Stat 101L – Lecture 39
Always
Religion
Almost
Always
Sometimes
Never
Total
Catholic
484
Jewish
35
Protestant
992
None
310
Other
107
Total
506
166
376
880
1928
13
Expected Count
 Catholic
and Always Wrong
484 506
*
1928 1928
484 * 506
E
 127.0
1928
E  1928 *
14
Expected Counts
Always
Religion
Catholic
Jewish
Protestant
Almost
Always
Sometimes
Never
Total
127.0
41.7
94.4
220.9
484
9.2
3.0
6.8
16.0
35
260.3
85.4
193.5
452.8
992
None
81.4
26.7
60.4
141.5
310
Other
28.1
9.2
20.9
48.8
107
Total
506
166
376
880
1928
15
Stat 101L – Lecture 39
Observed = Expected?
 Take
the difference between
the observed and expected
counts in a cell.
 Square the difference.
 Divide by the expected count.
 Sum up over all the cells.
16
Chi-square Test Statistic
 
2
O  E 2
E
df  r  1* c  1
17
Cell contributions to 
 Catholic
and Always
83  127.02   44.02
127.0
2
127.0
 15.24
18
Stat 101L – Lecture 39
Test of Independence
 H0:
Religion and attitude towards
premarital sex are independent.
 HA: Religion and attitude towards
premarital sex are not independent.

 2= 184.51, df=(5-1)*(4-1)=12
 P-value
< 0.0001
19
Test of Independence
 Because
the P-value is so
small we reject the null
hypothesis.
 Religion and Attitude towards
premarital sex are not
independent.
20
Comment
 Look
at the cells with the largest
contributions to the test statistic.
 None and Always Wrong has
much fewer people than
expected and None and Not
Wrong at All has much more
people than expected.
21
Stat 101L – Lecture 39
Comment
 Look
at the cells with the largest
contributions to the test statistic.
 Protestant and Always Wrong
has much more people than
expected and Protestant and
Not Wrong at All has much
fewer people than expected.
22
Summary
 Protestants
are much more likely to
say Always Wrong and much less
likely to say Not Wrong at All.
 People with no religion are much
more likely to say Not Wrong at All
and much less likely to say Always
Wrong.
23
JMP
Religion
1 Catholic
2 Jewish
3 Protestant
4 None
5 Other
Attitude
1 Always Wrong
1 Always Wrong
1 Always Wrong
1 Always Wrong
1 Always Wrong
5 Other
4 Not Wrong at All
Count
83
4
364
27
28
51
24
Stat 101L – Lecture 39
JMP
 Fit
Y by X
 Y, Response: Attitude
 X, Factor: Religion
 Freq: Count
25
Test
Likelihood Ratio
Pearson
ChiSquare
193.959
184.510
Prob>ChiSq
<.0001*
<.0001*
26
Comment
 Remember
that JMP only does
the calculations for you (Step
3). You have to provide all the
other steps in the test of
hypothesis.
27
Download