Nonparametric Tests, Part I

advertisement
Datasets with counts
150 people were surveyed about their softdrink preferences. They were asked,
“Do you prefer coke, pepsi, or sprite?”. 50 people said “coke”. 70 people said
“pepsi”. 30 people said “sprite”. Test the hypothesis that people differ in their
soda preferences.
Pepsi
Coke
Sprite
70
50
30
Solution:
H0: ppepsi= pcoke=psprite=1/3
HA: not all p’s are equal
Use a one-way chi-square test.
One-way Chi-Square Test (c2)
• Used when your dependent variable is counts within categories
(# pepsi lovers, # coke lovers, # sprite lovers)
• Used when your DV has two or more mutually exclusive
categories
• Compares the counts you got in your sample to those you would
expect under the null hypothesis
• Also called the Chi-Square “Goodness of Fit” test.
One-way c2 example
Which power would you rather have: flight, invisibility, or x-ray
vision?
Flight
Invisibility
X-ray vision
18 people
14 people
10 people
Is this difference significant, or is just due to chance?
One-way c2 example
Step 1: Write hypotheses
H0: pfly = pinvis = pxray = 1/3
HA: not all p’s are equal
Step 2: Write the observed frequencies, and also the frequencies that
would be expected under the null hypothesis
Flight
Invisibility
X-ray vision
fo= 18
fo= 14
fo= 10
N=42
One-way c2 example
Step 1: Write hypotheses
H0: pfly = pinvis = pxray = 1/3
HA: not all p’s are equal
Step 2: Write the observed frequencies, and also the frequencies that
would be expected under the null hypothesis
Flight
Invisibility
X-ray vision
fo= 18
fe= 14
fo= 14
fe= 14
fo= 10
fe= 14
N=42
One-way c2 example
( fo  fe )2
Step 3: Compute the relative squared discrepancies
fe
Flight
Invisibility
X-ray vision
fo= 18
fe= 14
fo= 14
fe= 14
fo= 10
fe= 14
( fo  fe )2
 1.143
fe
And sum them up
( fo  fe )2
0
fe
N=42
( fo  fe )2
 1.143
fe
( fo  fe )2
c 
 1.143  0 1.143  2.286
fe
2
df  k  1
One-way c2 example
2
c
Step 4: Compare to critical value of
Flight
Invisibility
X-ray vision
f0= 18
fe= 14
f0= 14
fe= 14
f0= 10
fe= 14
( f0  fe )2
 1.143
fe
c 2  2.286
2
c crit
 5.99
df  2
Retain null!
( f0  fe )2
0
fe
( f0  fe )2
 1.143
fe
N=42
Calculating one-way c2
Steps:
1) State hypotheses
2) Write observed and expected frequencies
3) Get c2 by summing up relative squared deviations
4) Use Table I to get critical c2
Practice
Suppose we ask 200 randomly selected people if they think that voting
should be made compulsory. The data come out like this:
No
Yes
fo= 84
f0= 116
Is there evidence for a clear preference?
2




f

f
2
o
e

c   

f
e


Practice
Suppose we ask 200 randomly selected people if they think that voting
should be made compulsory. The data come out like this:
No
Yes
f0= 84
f0= 116
fe=100
fe=100
 fo  fe 
2
fe
c 2  5.12
2
c crit
 3.84
 2.56
df  1
Reject null!
 f o  f e 2
fe
 2.56
Other null hypotheses
• we tested H0 that all cell frequencies are equal
• But can test any expected frequencies
• example – political affiliation among psych grad students:
Democrat
Republican
Independent
9
5
18
• political affiliation in the U.S. (Gallup):
Democrat
Republican
Independent
46%
43%
11%
Other null hypotheses
Is the distribution for psych grad students different than the distribution for the U.S.?
If not, then 46% of the 32 students would be Democrats, 43% would be republican,
and 11% would be independent
Democrat
Republican
Independent
fo = 9
fe= 14.7
fo = 5
fe= 13.8
fo= 18
fe= 3.5
( fo  fe )2
 2.21
fe
( fo  fe )2
 5.61
fe
2
(
f

f
)
c 2   o e  87.89
fe
2
c crit
 5.99
Reject null!
df  k  1  2
( fo  fe )2
 60.07
fe
N=32
Points of interest about c2
1. c2 cannot be negative
2. c2 will be zero only if each observed frequency exactly equals the
expected frequency
3. The larger the discrepancies, the larger the c2
4. The greater the number of groups, the larger the c2. That’s why
c2 distribution is a family of curves with df = k-1.
Two Factor Chi-Square
A 1999 New Jersey poll sampled people’s opinions concerning the use of the
death penalty for murder when given the option of life in prison instead. 800
people were polled, and the number of men and women supporting each
penalty were tabulated.
Preferred Penalty
Death Penalty
Life in Prison
No Opinion
Female
151
179
80
Male
201
117
72
Contingency table: shows contingency between two variables
Are these two variables (gender, penalty preference) independent??
Two-Factor Chi-Square Test
• Used to test whether two nominal variables are independent or
related
• E.g. Is gender related to socio-economic class?
• Compares the observed frequencies to the frequencies expected if
the variables were independent
• Called a chi-squared test of independence
• Fundamentally testing, “do these variables interact”?
Example
A 1999 New Jersey poll sampled people’s opinions concerning the use of the
death penalty for murder when given the option of life in prison instead. 800
people were polled, and the number of men and women supporting each
penalty were tabulated.
Preferred Penalty
Death Penalty
Life in Prison
No Opinion
Female
151
179
80
Male
201
117
72
H0: distribution of female preferences matches distribution of male preferences
HA: female proportions do not match male proportions
Example
Preferred Penalty
Death Penalty
Life in Prison
No Opinion
Female
f0= 151
fe= ___
f0= 179
fe= ___
f0= 80
fe= __
Male
f0= 201
fe= ___
f0= 117
fe= ___
f0= 72
fe= __
Example
Preferred Penalty
Death Penalty
Life in Prison
No Opinion
Female
f0= 151
fe= 133.3?
f0= 179
fe= 133.3?
f0= 80
fe= 133.3?
Male
f0= 201
fe= 133.3?
f0= 117
fe= 133.3?
f0= 72
fe= 133.3?
WRONG -- this is saying there is an equal # of men and women, and an equal
preference for prison sentences (e.g. no main effects).
We are willing to let there be main effects. We just want to test whether the
distribution of preferences for men and women is the same (e.g. no interaction effects)
Example
Preferred Penalty
Death Penalty
Life in Prison
No Opinion
Female
f0= 151
fe= ___
f0= 179
fe= ___
f0= 80
fe= __
Male
f0= 201
fe= ___
f0= 117
fe= ___
f0= 72
fe= __
We need to look at the marginal totals to get our expected frequencies
Example
Preferred Penalty
Death Penalty
Life in Prison
No Opinion
frow
Female
f0= 151
fe= ___
f0= 179
fe= ___
f0= 80
fe= __
410
Male
f0= 201
fe= ___
f0= 117
fe= ___
f0= 72
fe= __
390
fcol
352
296
152
n = 800
Example
Preferred Penalty
Death Penalty
Life in Prison
No Opinion
frow
Female
f0= 151
fe= ___
f0= 179
fe= ___
f0= 80
fe= __
410
Male
f0= 201
fe= ___
f0= 117
fe= ___
f0= 72
fe= __
390
fcol
352
pdeath=.44
296
plife=.37
152
pnone=.19
n = 800
Example
Preferred Penalty
Death Penalty
Life in Prison
No Opinion
frow
Female
f0= 151
fe=.44(410)
f0= 179
fe=.37(410)
f0= 80
fe=.19(410)
410
Male
f0= 201
fe=.44(390)
f0= 117
fe=.37(390)
f0= 72
fe=.19(390)
390
fcol
352
pdeath=.44
296
plife=.37
152
pnone=.19
n = 800
Example
Preferred Penalty
Death Penalty
Life in Prison
No Opinion
frow
Female
f0= 151
fe=180.4
f0= 179
fe=151.7
f0= 80
fe=77.9
410
Male
f0= 201
fe=171.6
f0= 117
fe=144.3
f0= 72
fe=74.1
390
fcol
352
pdeath=.44
296
plife=.37
152
pnone=.19
n = 800
  f o  f e 2 
  4.79  4.91  0.06  5.04  5.16  0.06  20.02
c   

f
e


2
df  (k gender  1)( k preference  1)  1 2  2
Reject null!
c crit
 5.99
2
Calculating two-way c2
Steps:
1) State hypotheses
f
2) Get expected frequencies f e  col ( f row )
n
3) Get c2 by summing up relative squared deviations
4) Use table to get critical c2
Practice
Suppose we want to determine if there is any relationship between level of
education and medium through which one follows current events. We ask a
random sample of high school graduates and a random sample of college
graduates whether they keep up with the news mostly by reading the paper
or by listening to the radio or by watching television.
radio
paper
TV
10
29
61
college 24
44
32
HS
fe 
f col
( f row )
n
2




f

f
2
o
e

c   

fe


Practice
HS
college
fcol
  f o  f e 2 

c   

f
e


2
df = (2)*(1) = 2
radio
paper
TV
frow
fo=10
fe=17
fo=24
fe=17
34
fo=29
fe=36.5
fo=44
fe=36.5
73
fo=61
fe=46.5
fo=32
fe=46.5
93
100
pradio= .17
ppaper= .365
pTV= .465
= 17.89
2
c crit
 5.99
100
N=200
Assumptions of Chi-Square Test
1. Categories are mutually exclusive
– A subject cannot be counted in more than one cell
2. Expected frequency in each cell must be
– at least 10 when kA and kB are less than or equal to 2
– at least 5 when kA or kB is greater than 2 (e.g., a 2x3 design)
– N must be sufficiently large to ensure that this is true
Download