R for Applied Statistical Methods

advertisement
R for Applied Statistical Methods
Larry Winner
Department of Statistics
University of Florida
2-Sample t-test (Independent Samples) – Case 1
M odel:
N  m ean   ,sd= 
N ote: S om e introductory books use
G roup 1 (S am ple S ize = n1 ) :
Y11 , ..., Y1 n1 ~ N   1 ,  1
G roup 2 (S am ple S ize = n 2 ) :
Y 21 , ..., Y 2 n 2 ~ N   2 ,  2
2

w e use var = 
2

2

D ata:
n1
n1

 Y1 j
Y1 
j 1
s 
2
1
n1
H ypothesis:
Y1 j  Y 1
j 1

1    100%
s 
2
2
n2
Y2 j  Y 2
j 1

2
n2  1
H A : 1   2  0

Y1 Y 2
P  2 P  t  n1  n 2  2   t o b s
Y
1
2
1
2
w here
1 
2  1
sp  

 n1 n 2 
C I for  1   2 :

Y2 j
j 1
Y2 
C ase 1: E qual P opulation V ariances:
P  value:
n2

n1  1
H 0 : 1   2  0
T est S tatistic: t obs 
n2
2
2

s 
2
p
 n1  1  s12   n 2  1  s 22



 Y 2  t  , n1  n 2  2 
 2


n1  n 2  2
w ith
df  n1  n 2  2
2-Sample t-test– Case 2 and Test of Equal Variances
C ase 2: U nequal P opulation V ariances:
T est S tatistic: t o b s 
Y1 Y 2
2
s1
P  value:
1    100%
n2
C I for  1   2 :
H0 :


1
HA:


2
w ith df 
P  2 P  t  df S   t obs
2
1
2
2
2
s2
Y

K now n as W elch's m ethod (and S atterthw a ite approx for df)
 s12 s 22 



 n1 n 2 
2
2
2
 s2 n 2
s2 n2  


1
1



 n1  1
n2  1 


 dfS

1


 Y 2  t  , df S 
 2


T esting for E qual V ariances: F  test:
2
1
2
2
2
1
2

n1

N o te m any packages use Levene's T est
1
2
T est S tatistic: Fo b s 
P  value:
1    100%
s1
w ith df 1  n1  1,
2
s2
df 2  n 2  1


1 
P  2 m in  P  F  n1  1, n 2  1   Fo b s  , P  F  n 2  1, n1  1  

Fo b s  


C I for


2
1
2
2
:



 
F 
  2
 s12 
 2 
 s2 

; n1  1, n 2  1 

,






F  1  ; n1  1, n 2  1  
2

 
 s12 
 2 
 s2 



w he re: F  1  ; n1  1, n 2  1  
2


1


F  ; n 2  1, n1  1 
 2

Example – NBA and WNBA Players’ BMI
• Groups: Male: NBA(i=1) and Female: WNBA(i=2)
• Samples: Random Samples of n1 = n2 = 20 from
2013 seasons (2013/2014 for NBA)
kg
 lbs 
B M I  703 

2 
2
 inches  m etres
Player
Giannis Antetokounmpo
Joel Anthony
Alex Len
Erik Murphy
Ersan Ilyasova
Kevin Garnett
Chauncey Billups
Juwan Howard
Vladimir Radmanovic
Tiago Splitter
Jarvis Varnado
Alexey Shved
Jermaine O`Neal
Michael Kidd-Gilchrist
Metta World Peace
Tim Hardaway Jr.
Greivis Vasquez
Daniel Gibson
Terrence Ross
Chris Kaman
id
Gender
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Height
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
81
81
85
82
82
83
75
81
82
83
81
78
83
79
79
78
78
74
79
84
Weight BMI
Player
205
21.97 Tamika Catchings
245
26.25 Courtney Clements
255
24.81 Allie Quigley
230
24.05 Quanitra Hollingsworth
235
24.57 Katie Smith
253
25.82 Tayler Hill
202
25.25 Allison Hightower
250
26.79 Kara Braxton
235
24.57 Eshaya Murphy
240
24.49 Michelle Campbell
230
24.64 Briann January
190
21.95 Jasmine James
255
26.02 Kelsey Bone
232
26.13 Jia Perkins
260
29.29 Ebony Hoffman
205
23.69 Shavonte Zellous
211
24.38 Matee Ajavon
200
25.68 Karima Christmas
197
22.19 Erika de Souza
265
26.40 Jayne Appel
M ales:
n1 = 20
Fem ales:
id
n 2 = 20
Gender
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
Height
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
73
72
70
77
71
70
70
78
71
74
68
69
76
68
74
70
68
72
77
76
Y 1  24.9466
s1  3.0919
2
Y 2  23.3510
Weight BMI
167 22.03059
155 21.01948
140 20.08571
203 24.06966
175 24.40488
145 20.80306
139 19.94224
225 25.99852
164 22.87086
183 23.49324
144 21.89273
175 25.84016
200 24.34211
155 23.5651
215 27.60135
155 22.23776
160 24.32526
180 24.40972
190 22.52825
210 25.55921
s 2  4.2694
2
Note: Actual
data file has
males “stacked”
over Females.
See next slide.
Data File (.csv)
Player
Giannis Antetokounmpo
Joel Anthony
Alex Len
Erik Murphy
Ersan Ilyasova
Kevin Garnett
Chauncey Billups
Juwan Howard
Vladimir Radmanovic
Tiago Splitter
Jarvis Varnado
Alexey Shved
Jermaine O`Neal
Michael Kidd-Gilchrist
Metta World Peace
Tim Hardaway Jr.
Greivis Vasquez
Daniel Gibson
Terrence Ross
Chris Kaman
Tamika Catchings
Courtney Clements
Allie Quigley
Quanitra Hollingsworth
Katie Smith
Tayler Hill
Allison Hightower
Kara Braxton
Eshaya Murphy
Michelle Campbell
Briann January
Jasmine James
Kelsey Bone
Jia Perkins
Ebony Hoffman
Shavonte Zellous
Matee Ajavon
Karima Christmas
Erika de Souza
Jayne Appel
Gender
Height
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
Weight
81
81
85
82
82
83
75
81
82
83
81
78
83
79
79
78
78
74
79
84
73
72
70
77
71
70
70
78
71
74
68
69
76
68
74
70
68
72
77
76
BMI
205
245
255
230
235
253
202
250
235
240
230
190
255
232
260
205
211
200
197
265
167
155
140
203
175
145
139
225
164
183
144
175
200
155
215
155
160
180
190
210
21.9654
26.25133
24.81176
24.0467
24.56945
25.81783
25.24551
26.78708
24.56945
24.49122
24.64411
21.95431
26.02192
26.13299
29.28697
23.68754
24.38083
25.67568
22.19051
26.40235
22.03059
21.01948
20.08571
24.06966
24.40488
20.80306
19.94224
25.99852
22.87086
23.49324
21.89273
25.84016
24.34211
23.5651
27.60135
22.23776
24.32526
24.40972
22.52825
25.55921
t-test for NBA vs WNBA BMI – Equal Variances
H 0 : 1   2  0
T S : t obs 
Y
1
H A : 1   2  0
Y2

1
1 
s  

n
n
 1
2 
H0
t  n1  n 2  2 
~
2
p
D ata (From E X C E LS preadsheet):
n1  n 2  20
 s 
2
p
 t obs 
Y 1  24.9466
Y
2
 23.3510
 20  1  3.0919   20  1  4.2694
20  20  2
 24.9466  23.3510   0
3.6806
1 
 1

 20 20 


s1  3.0919
2
 3.6806
 2.6301
P  value : 2 P  t  38   2.6301   2(.0061)  .0122
s 2  4.2694
2
t-test for NBA vs WNBA BMI – Unequal Variances
H 0 : 1   2  0
H
T est S tatistic: t o b s 
A
: 1   2  0
Y1 Y
s
2
1

n1
tobs 
Y1 Y
s
2
1
n1
df 

2
s

2
s
w ith d f 
2
2
n2
2 4 .9 4 6 6  2 3 .3 5 1 0
2
2
3 .0 9 1 9

  3 .0 9 1 9 2 0 



20  1

2
2
s
n
 2 2  
n2  1 

 2 .6 3 0 1
20
4 .2 6 9 4 
 3 .0 9 1 9



20
20


2
 s2 n 2

1 
 1

 n1  1

2
4 .2 6 9 4
20
n2
2
 s12
s2 



n
n
 1
2 
2
20  


20  1

 4 .2 6 9 4
P  va lu e : 2 P  t  3 7   2 .6 3 0 1
2

0 .1 3 5 4 7 2
 3 7.0 5
0 .0 0 3 6 5 6
  2 (.0 0 6 2 )  .0 1 2 4
Note: the test statistics are the same (n1 = n2) and the degrees of freedom very close (s1≈ s2)
Test for Equal Variances for WNBA vs NBA BMI
D ata: n1  20
s1  3.0919
2
n 2  20
s 2  4.2694
2


C ritical F-values    0.05 
 0.025
2

F  0.025;19,19   2.5265
1
2
H0 :

2
2
1
1
F  0.975;19,19  

 0.975,
2
1

df 1  df 2  20  1  19  :

 0.3958
2.5265
2
1
HA:

2
2
1
2
T est S tatistic: Fobs 
P  value:
s1
s
2
2

3.0919
 0.7242
4.2694

1


P  2 m in  P  F 19,19   0.7242  , P  F 19,19  
 
0.7242



2 m in  0.7557 , 0 .2443   2(0.2443)  0.4886
1    100%
1
2
C I for
2
2
:
 0.7242
 2.5265

,
0.7242 
0.3958 

 0.2866 , 1.8297 
Small Sample Test to Compare Two Medians –
Non-Normal Populations
• Two Independent Samples (Parallel Groups)
• Procedure (Wilcoxon Rank-Sum Test):
 Null hypothesis: Population Medians are equal H0: M1 = M2
 Rank measurements across samples from smallest (1) to
largest (n1+n2). Ties take average ranks.
 Obtain the rank sum for group with smallest sample size (T )
 1-sided tests: Conclude HA: M1 > M2 if T > TU

Conclude: HA: M1 < M2 if T < TL
 2-sided tests: Conclude HA: M1  M2 if T > TU or T < TL
 Values of TL and TU are given in tables for various sample sizes
and significance levels (Some tables use T=Rank sum for larger
Group).
 This test gives equivalent conclusions as Mann-Whitney U-test
Rank-Sum Test: Normal Approximation
• Under the null hypothesis of no difference in the two
groups (let T be rank sum for group 1):
T 
n1 ( N  1)
2
n1 n 2 ( N  1)
T 
12
N  n1  n 2
• A z-statistic can be computed and P-value
(approximate) can be obtained from Z-distribution
z obs 
T  T
T

T  n1 ( N  1) / 2
n1 n 2 ( N  1) / 12
Note: When there are many ties in ranks, a more complex formula
for T is often used, with little effect unless there are many ties.
WNBA/NBA BMI Data – Wilcoxon Rank-Sum Test
Player
Giannis Antetokounmpo
Joel Anthony
Alex Len
Erik Murphy
Ersan Ilyasova
Kevin Garnett
Chauncey Billups
Juwan Howard
Vladimir Radmanovic
Tiago Splitter
Jarvis Varnado
Alexey Shved
Jermaine O`Neal
Michael Kidd-Gilchrist
Metta World Peace
Tim Hardaway Jr.
Greivis Vasquez
Daniel Gibson
Terrence Ross
Chris Kaman
Tamika Catchings
Courtney Clements
Allie Quigley
Quanitra Hollingsworth
Katie Smith
Tayler Hill
Allison Hightower
Kara Braxton
Eshaya Murphy
Michelle Campbell
Briann January
Jasmine James
Kelsey Bone
Jia Perkins
Ebony Hoffman
Shavonte Zellous
Matee Ajavon
Karima Christmas
Erika de Souza
Jayne Appel
id
Gender
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
Height
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
81
81
85
82
82
83
75
81
82
83
81
78
83
79
79
78
78
74
79
84
73
72
70
77
71
70
70
78
71
74
68
69
76
68
74
70
68
72
77
76
Weight BMI
Rank
205 21.9654
7
245 26.25133
36
255 24.81176
27
230 24.0467
16
235 24.56945
24.5
253 25.81783
31
202 25.24551
28
250 26.78708
38
235 24.56945
24.5
240 24.49122
23
230 24.64411
26
190 21.95431
6
255 26.02192
34
232 26.13299
35
260 29.28697
40
205 23.68754
15
211 24.38083
20
200 25.67568
30
197 22.19051
9
265 26.40235
37
167 22.03059
8
155 21.01948
4
140 20.08571
2
203 24.06966
17
175 24.40488
21
145 20.80306
3
139 19.94224
1
225 25.99852
33
164 22.87086
12
183 23.49324
13
144 21.89273
5
175 25.84016
32
200 24.34211
19
155 23.5651
14
215 27.60135
39
155 22.23776
10
160 24.32526
18
180 24.40972
22
190 22.52825
11
210 25.55921
29
T  7  36  ...  9  37  507
T 
z obs 
20(41  1)
 410
2
T  T
T

n1  n 2  20
(20)(20)(41)
T 
507  410
 1366.667
12
97

N  40
 2.6239
36.9685
1366.667
P  value  2 P  Z  2.6239
  .0087
R uses a different algorithm for a sligh tly different P -value.
N ote: T he statistic R com putes is W  T 
n1  n1  1 
2
T his is difference betw een T and the m inim um it could be.
W T 
n1  n1  1 
2
 507 
20(21)
2
 507  210  297
R Program and Output
bmi1 <read.csv("http://www.stat.ufl.edu/~winner/data/wnba_nba_bmi.csv",header=T)
attach(bmi1); names(bmi1)
tapply(BMI,Gender,mean)
# Obtain mean BMI by Gender
tapply(BMI,Gender,var)
# Obtain variance of BMI by Gender
tapply(BMI,Gender,length) # Obtain sample size of BMI by Gender
t.test(BMI~Gender,var.equal=T)
#
t.test(BMI~Gender)
#
var.test(BMI~Gender)
#
wilcox.test(BMI~Gender)
#
#################################
> tapply(BMI,Gender,mean)
1
2
24.94665 23.35099
> tapply(BMI,Gender,var)
1
2
3.091871 4.269420
> tapply(BMI,Gender,length)
1 2
20 20
t-test with Equal Variances
t-test with Unequal Variances
F-test for Equal Variances
Wilcoxon Rank-Sum Test
# Obtain mean BMI by Gender
# Obtain variance of BMI by Gender
# Obtain sample size of BMI by Gender
R Output (Continued)
> t.test(BMI~Gender,var.equal=T)
# t-test with Equal Variances
Two Sample t-test
data: BMI by Gender
t = 2.6301, df = 38, p-value = 0.01226
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.3674868 2.8238189
sample estimates:
mean in group 1 mean in group 2
24.94665
23.35099
> t.test(BMI~Gender)
# t-test with Unequal Variances
Welch Two Sample t-test
data: BMI by Gender
t = 2.6301, df = 37.052, p-value = 0.01236
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.3664539 2.8248518
sample estimates:
mean in group 1 mean in group 2
24.94665
23.35099
R Output (Continued)
> var.test(BMI~Gender)
# F-test for Equal Variances
F test to compare two variances
data: BMI by Gender
F = 0.7242, num df = 19, denom df = 19, p-value = 0.4885
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.2866432 1.8296302
sample estimates:
ratio of variances
0.7241899
> wilcox.test(BMI~Gender)
Wilcoxon rank sum test with continuity correction
data: BMI by Gender
W = 297, p-value = 0.009042
alternative hypothesis: true location shift is not equal to 0
Warning message:
In wilcox.test.default(x = c(21.96540162, 26.25133364, 24.81176471,
cannot compute exact p-value with ties
:
Paired t-test
S etting: n m atched pairs, each under 1 of 2 com peting conditions
N ote: In m any experim ents, it is the sam e S ubject under each condition
D ata: d j  Y1 j  Y 2 j
j  1, ..., n
n

d 
n

dj
j 1
sd 
2
n
H 0 :  d  1   2  0
T S : t obs 
d
s
dj d
j 1
D ifference betw een m easu rem ent unde r C onditions 1 and 2

2
n 1
H A :  d  1   2  0
w ith df  n  1
2
d
n
P  value:
1    100%
P  2 P  t  n  1   t obs 
C I for  d   1   2 :


d  t  ; n 1
 2

2
sd
n
Example: English Premier League Football - 2012
• Interested in Determining if there is a home field effect
 League has 20 teams, all play all 19 opponents Home and Away
(190 “pairs” of teams, each playing once on each team’s home
field). No overtime.
 We are treating each “pair of teams” as a unit
 Y1 is the Total Score for the Home Teams, Y2 is for Away
• Note: d represents combined Home Goals – Combined
Away Goals for the Pair of teams (“units”)
• No home effect should mean d = 0
• Programming Note: In Independent Sample t-test, we
had a Variable for Treatment/Group and another variable
for Response (Y). Here we have Y1 and Y2 as separate
variables, with each row as a unit
Portion of Data File (.csv). Note n =190
Team1
Arsenal
Arsenal
Arsenal
Arsenal
Arsenal
Arsenal
Arsenal
Arsenal
Arsenal
Arsenal
Arsenal
Arsenal
Arsenal
Arsenal
Arsenal
Arsenal
Arsenal
Arsenal
Arsenal
Aston Villa
Team2
Home
Aston Villa
Chelsea
Everton
Fulham
Liverpool
Manchester City
Manchester United
Newcastle United
Norwich City
Queens Park Rangers
Reading
Southampton
Stoke City
Sunderland
Swansea City
Tottenham Hotspur
West Bromwich Albion
West Ham United
Wigan Athletic
Chelsea
Away
2
3
1
3
2
1
3
7
4
1
6
7
1
0
0
7
3
6
4
9
1
3
1
4
4
3
2
4
1
1
6
2
0
1
4
3
2
4
2
2
Paired t-test for EPL 2012 Home vs Away Goals
H 0 :  d  1   2  0
T S : t obs 
d
s
w ith
H A :  d  1   2  0
df  n  1
2
d
n
D ata (From E X C E L S preadsheet): n  190
 t obs 
0.6368
d  0.636 8
s d  4.3912
2
 4.1888
4.3912
190
P  value : 2 P  t  189   4.1888
95:% C I for  d :

  2(.00002)  .00004
0.6368  1.9726
 0.3369 , 0.9367 
4 .3912
190
 0.6368  0.2999
R Program / Output
epl.2012 <read.csv("http://www.stat.ufl.edu/~winner/data/epl_2012_home.csv",
header=T)
attach(epl.2012); names(epl.2012)
t.test(Home,Away,paired=T)
wilcox.test(Home,Away,paired=T)
#######################
> t.test(Home,Away,paired=T)
Paired t-test
data: Home and Away
t = 4.1891, df = 189, p-value = 4.294e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.3369575 0.9367267
sample estimates:
mean of the differences
0.6368421
Small-Sample Test For Nonnormal Data
• Paired Samples (Crossover Design)
• Procedure (Wilcoxon Signed-Rank Test)
 Compute Differences di (as in the paired t-test) and obtain their absolute
values (ignoring 0s). n= number of non-zero differences
 Rank the observations by |di| (smallest=1), averaging ranks for ties
 Compute T+ and T- , the rank sums for the positive and negative differences,
respectively
 1-sided tests:Conclude HA: M1 > M2 if T=T-  T0
 2-sided tests:Conclude HA: M1  M2 if T=min(T+ , T- )  T0
 Values of T0 are given in various tables for various sample sizes and
significance levels. Some tables give the upper tail cut-off T0 values
 P-values are printed by statistical software packages.
Signed-Rank Test: Normal Approximation
Under the null hypothesis of no difference in the 2 groups: Let T = T+
T 
n ( n  1)
4
T 
n ( n  1)(2 n  1)
24
Z-Statistic computed and approximate P-value can be obtained from:
z obs 
T  T
T

T  n ( n  1) / 4
n ( n  1)(2 n  1) / 24
When there are ties (many common ds) as in soccer data, T is reduced and is of form:
T 
1 
1
 n  n  1  2 n  1 
24 
2
g

t 1

t j  t j  1  t j  1 


w here: g  # of distinct levels of d and t j is the # of ties at level j
EPL Home Field Advantage
Diff (d)
sum
|Diff|
Count (t) T+
-6
1
-4
3
-3
7
-2
17
-1
27
1
30
2
33
3
16
4
10
5
6
7
1
151
0
0
0
0
0
29
82.5
119
137
146.5
151
T+
E(T)
sigma^2_T
sigma_T
Z
p-value
7896.5
5738
283006
531.98
4.0575
0.0000496
Count(t) LowRank HighRank MeanRank t(t-1)*(t+1)
1
57
1
57
29
185136
2
50
58
107
82.5
124950
3
23
108
130
119
12144
4
13
131
143
137
2184
5
6
144
149
146.5
210
6
1
150
150
150
0
7
1
151
151
151
0
324624
• Zero differences have been
removed
• The Differences and their Counts
are at top left
• Absolute differences and their
counts and average ranks are at
bottom
• T+ is the sum of the products of
the counts and the T+ columns
(e.g. There are 30 cases with
d=+1, each getting rank=29)
• The Z is large and P-value is small
• R Labels T+ as V
R Output
> wilcox.test(Home,Away,paired=T)
Wilcoxon signed rank test with continuity correction
data: Home and Away
V = 7896.5, p-value = 4.981e-05
alternative hypothesis: true location shift is not equal to 0
Test for Association for Categorical Variables
Counts
Col 1
Col 2
…
Col c
Total
Row 1
n11
n12
…
n1c
n1•
Row 2
n21
n22
…
n2c
n2•
…
…
…
…
…
…
Row r
nr1
nr2
…
nrc
nr•
Total
n•1
n•2
…
n•c
n••
^
E xpected C ell C ounts:
n ij 
ni n j
i  1, ..., r ; j  1, ..., c
n
r
P earson C hi-S quare S tatistic: X P 
2
c

i 1
^


n

n
ij 
 ij


j 1
^
L ikelihood-R atio C hi-S quare S tatistic: X
df   r  1   c  1 
n ij
r
2
LR
2
 2
i 1
n 
ij
 n ij ln  ^ 
j 1
 n ij 
c
df   r  1   c  1 
R eject the null hypothesis of no associa tion betw een the row and colum n variable s if:
X
2
 
2
 ;  r  1  c  1 
   r  1  c  1   X 
P  value: P  P 
2
2
Example: Crop Circles by Country and Field Type
^
E xpected C ell C ounts:
Observed
Country
other
wheat
Total
England
108
323
Germany
47
90
Italy
56
46
USA
27
17
Canada
32
11
Holland
10
24
Switzerland
6
23
Belgium
4
18
Czech Republic
7
14
Total
297
566
Percent
34.41483 65.58517
ni n j
i  1, ..., 9; j  1, 2
n
^
431
137
102
44
43
34
29
22
21
863
100
Both tests are highly significant.
Expected
Country
wheat0 wheat1 Total
England
148.3279 282.6721
Germany
47.14832 89.85168
Italy
35.10313 66.89687
USA
15.14253 28.85747
Canada
14.79838 28.20162
Holland
11.70104 22.29896
Switzerland
9.980301 19.0197
Belgium
7.571263 14.42874
Czech Republic 7.227115 13.77289
Total
297
566
n ij 
431
137
102
44
43
34
29
22
21
863
For E ngland/other (i= 1 , j= 1): n 11 
n1  n  1

431(297 )
n
P earson C hi-S quare S tatistic: X P 
2
 148.33
w ith
863
r
c
i 1
j 1

^


 n ij  n ij 


C ontribution from cell w ith E ngland/othe r:
2
^
n ij
^


 n11  n 11 


^
2

108  148.33 
r
Likelihood-R atio C hi-S quare S tatistic: X L R  2 
2
i 1
 10.97
n 
ij
n
ln
 ij  ^ 
j 1
 n ij 
c
n
2 n11 ln  ^ 11

 n 11
Pearson Chi-square
Country
wheat0 wheat1 Total
England
10.9645 5.753457 16.71796
Germany
0.000467 0.000245 0.000711
Italy
12.43989 6.527648 18.96754
USA
9.285088 4.872211 14.1573
Canada
19.99515 10.49216 30.48731
Holland
0.24729 0.129762 0.377051
Switzerland
1.587407 0.832968 2.420375
Belgium
1.684517 0.883925 2.568442
Czech Republic 0.007137 0.003745 0.010882
Total
56.21145 29.49612 85.70757
85.70757 X^2(obs)
15.50731 X^2(.05,8)
2
148.33
n 11
C ontribution from cell w ith E ngland/o ther:
n11  108

 108 
  2(108) ln 
   68.54

 148.33 

Likelihood-Ratio Chi-Square
Country
wheat0 wheat1 Total
England
-68.5356 86.15369 17.61812
Germany
-0.29617 0.296884 0.000712
Italy
52.31088 -34.455 17.85589
USA
31.22981 -17.9913 13.23852
Canada
49.35797 -20.7127 28.64532
Holland
-3.14186 3.528668 0.386811
Switzerland
-6.10625 8.740874 2.634628
Belgium
-5.10452 7.961397 2.856873
Czech Republic -0.44702 0.457954 0.010938
Total
49.26727 33.98053 83.2478
R Program – Uses the vcd Package
cc <- read.csv("http://www.stat.ufl.edu/~winner/data/crop_circle",header=T)
attach(cc); names(cc)
(wheat.country <- table(Country,wheat))
chisq.test(wheat.country)
install.packages("vcd")
library(vcd)
assocstats(wheat.country)
barplot(wheat.country,
col=c("blue","green","pink","purple","red",
"yellow","orange","cornflowerblue","beige"),
main="Wheat by Country",xlab="Wheat",ylab="Count")
labs <- rownames(wheat.country)
legend(locator(1),labs,fill=c("blue","green","pink","purple","red",
"yellow","orange","cornflowerblue","beige"))
barplot(wheat.country,beside=T,
col=c("blue","green","pink","purple","red",
"yellow","orange","cornflowerblue","beige"),
main="Wheat by Country",xlab="Wheat",ylab="Count")
labs <- rownames(wheat.country)
legend(locator(1),labs,fill=c("blue","green","pink","purple","red",
"yellow","orange","cornflowerblue","beige"))
R Output
> (wheat.country <- table(Country,wheat))
wheat
Country
0
1
Belgium
4 18
Canada
32 11
Czech
7 14
England 108 323
Germany 47 90
Holland 10 24
Italy
56 46
Swiss
6 23
USA
27 17
##################################################
> assocstats(wheat.country)
X^2 df
P(> X^2)
Likelihood Ratio 83.248 8 1.0880e-14
Pearson
85.708 8 3.4417e-15
Phi-Coefficient
: 0.315
Contingency Coeff.: 0.301
Cramer's V
: 0.315
Download