R for Applied Statistical Methods Larry Winner Department of Statistics University of Florida 2-Sample t-test (Independent Samples) – Case 1 M odel: N m ean ,sd= N ote: S om e introductory books use G roup 1 (S am ple S ize = n1 ) : Y11 , ..., Y1 n1 ~ N 1 , 1 G roup 2 (S am ple S ize = n 2 ) : Y 21 , ..., Y 2 n 2 ~ N 2 , 2 2 w e use var = 2 2 D ata: n1 n1 Y1 j Y1 j 1 s 2 1 n1 H ypothesis: Y1 j Y 1 j 1 1 100% s 2 2 n2 Y2 j Y 2 j 1 2 n2 1 H A : 1 2 0 Y1 Y 2 P 2 P t n1 n 2 2 t o b s Y 1 2 1 2 w here 1 2 1 sp n1 n 2 C I for 1 2 : Y2 j j 1 Y2 C ase 1: E qual P opulation V ariances: P value: n2 n1 1 H 0 : 1 2 0 T est S tatistic: t obs n2 2 2 s 2 p n1 1 s12 n 2 1 s 22 Y 2 t , n1 n 2 2 2 n1 n 2 2 w ith df n1 n 2 2 2-Sample t-test– Case 2 and Test of Equal Variances C ase 2: U nequal P opulation V ariances: T est S tatistic: t o b s Y1 Y 2 2 s1 P value: 1 100% n2 C I for 1 2 : H0 : 1 HA: 2 w ith df P 2 P t df S t obs 2 1 2 2 2 s2 Y K now n as W elch's m ethod (and S atterthw a ite approx for df) s12 s 22 n1 n 2 2 2 2 s2 n 2 s2 n2 1 1 n1 1 n2 1 dfS 1 Y 2 t , df S 2 T esting for E qual V ariances: F test: 2 1 2 2 2 1 2 n1 N o te m any packages use Levene's T est 1 2 T est S tatistic: Fo b s P value: 1 100% s1 w ith df 1 n1 1, 2 s2 df 2 n 2 1 1 P 2 m in P F n1 1, n 2 1 Fo b s , P F n 2 1, n1 1 Fo b s C I for 2 1 2 2 : F 2 s12 2 s2 ; n1 1, n 2 1 , F 1 ; n1 1, n 2 1 2 s12 2 s2 w he re: F 1 ; n1 1, n 2 1 2 1 F ; n 2 1, n1 1 2 Example – NBA and WNBA Players’ BMI • Groups: Male: NBA(i=1) and Female: WNBA(i=2) • Samples: Random Samples of n1 = n2 = 20 from 2013 seasons (2013/2014 for NBA) kg lbs B M I 703 2 2 inches m etres Player Giannis Antetokounmpo Joel Anthony Alex Len Erik Murphy Ersan Ilyasova Kevin Garnett Chauncey Billups Juwan Howard Vladimir Radmanovic Tiago Splitter Jarvis Varnado Alexey Shved Jermaine O`Neal Michael Kidd-Gilchrist Metta World Peace Tim Hardaway Jr. Greivis Vasquez Daniel Gibson Terrence Ross Chris Kaman id Gender 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Height 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 81 81 85 82 82 83 75 81 82 83 81 78 83 79 79 78 78 74 79 84 Weight BMI Player 205 21.97 Tamika Catchings 245 26.25 Courtney Clements 255 24.81 Allie Quigley 230 24.05 Quanitra Hollingsworth 235 24.57 Katie Smith 253 25.82 Tayler Hill 202 25.25 Allison Hightower 250 26.79 Kara Braxton 235 24.57 Eshaya Murphy 240 24.49 Michelle Campbell 230 24.64 Briann January 190 21.95 Jasmine James 255 26.02 Kelsey Bone 232 26.13 Jia Perkins 260 29.29 Ebony Hoffman 205 23.69 Shavonte Zellous 211 24.38 Matee Ajavon 200 25.68 Karima Christmas 197 22.19 Erika de Souza 265 26.40 Jayne Appel M ales: n1 = 20 Fem ales: id n 2 = 20 Gender 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Height 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 73 72 70 77 71 70 70 78 71 74 68 69 76 68 74 70 68 72 77 76 Y 1 24.9466 s1 3.0919 2 Y 2 23.3510 Weight BMI 167 22.03059 155 21.01948 140 20.08571 203 24.06966 175 24.40488 145 20.80306 139 19.94224 225 25.99852 164 22.87086 183 23.49324 144 21.89273 175 25.84016 200 24.34211 155 23.5651 215 27.60135 155 22.23776 160 24.32526 180 24.40972 190 22.52825 210 25.55921 s 2 4.2694 2 Note: Actual data file has males “stacked” over Females. See next slide. Data File (.csv) Player Giannis Antetokounmpo Joel Anthony Alex Len Erik Murphy Ersan Ilyasova Kevin Garnett Chauncey Billups Juwan Howard Vladimir Radmanovic Tiago Splitter Jarvis Varnado Alexey Shved Jermaine O`Neal Michael Kidd-Gilchrist Metta World Peace Tim Hardaway Jr. Greivis Vasquez Daniel Gibson Terrence Ross Chris Kaman Tamika Catchings Courtney Clements Allie Quigley Quanitra Hollingsworth Katie Smith Tayler Hill Allison Hightower Kara Braxton Eshaya Murphy Michelle Campbell Briann January Jasmine James Kelsey Bone Jia Perkins Ebony Hoffman Shavonte Zellous Matee Ajavon Karima Christmas Erika de Souza Jayne Appel Gender Height 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Weight 81 81 85 82 82 83 75 81 82 83 81 78 83 79 79 78 78 74 79 84 73 72 70 77 71 70 70 78 71 74 68 69 76 68 74 70 68 72 77 76 BMI 205 245 255 230 235 253 202 250 235 240 230 190 255 232 260 205 211 200 197 265 167 155 140 203 175 145 139 225 164 183 144 175 200 155 215 155 160 180 190 210 21.9654 26.25133 24.81176 24.0467 24.56945 25.81783 25.24551 26.78708 24.56945 24.49122 24.64411 21.95431 26.02192 26.13299 29.28697 23.68754 24.38083 25.67568 22.19051 26.40235 22.03059 21.01948 20.08571 24.06966 24.40488 20.80306 19.94224 25.99852 22.87086 23.49324 21.89273 25.84016 24.34211 23.5651 27.60135 22.23776 24.32526 24.40972 22.52825 25.55921 t-test for NBA vs WNBA BMI – Equal Variances H 0 : 1 2 0 T S : t obs Y 1 H A : 1 2 0 Y2 1 1 s n n 1 2 H0 t n1 n 2 2 ~ 2 p D ata (From E X C E LS preadsheet): n1 n 2 20 s 2 p t obs Y 1 24.9466 Y 2 23.3510 20 1 3.0919 20 1 4.2694 20 20 2 24.9466 23.3510 0 3.6806 1 1 20 20 s1 3.0919 2 3.6806 2.6301 P value : 2 P t 38 2.6301 2(.0061) .0122 s 2 4.2694 2 t-test for NBA vs WNBA BMI – Unequal Variances H 0 : 1 2 0 H T est S tatistic: t o b s A : 1 2 0 Y1 Y s 2 1 n1 tobs Y1 Y s 2 1 n1 df 2 s 2 s w ith d f 2 2 n2 2 4 .9 4 6 6 2 3 .3 5 1 0 2 2 3 .0 9 1 9 3 .0 9 1 9 2 0 20 1 2 2 s n 2 2 n2 1 2 .6 3 0 1 20 4 .2 6 9 4 3 .0 9 1 9 20 20 2 s2 n 2 1 1 n1 1 2 4 .2 6 9 4 20 n2 2 s12 s2 n n 1 2 2 20 20 1 4 .2 6 9 4 P va lu e : 2 P t 3 7 2 .6 3 0 1 2 0 .1 3 5 4 7 2 3 7.0 5 0 .0 0 3 6 5 6 2 (.0 0 6 2 ) .0 1 2 4 Note: the test statistics are the same (n1 = n2) and the degrees of freedom very close (s1≈ s2) Test for Equal Variances for WNBA vs NBA BMI D ata: n1 20 s1 3.0919 2 n 2 20 s 2 4.2694 2 C ritical F-values 0.05 0.025 2 F 0.025;19,19 2.5265 1 2 H0 : 2 2 1 1 F 0.975;19,19 0.975, 2 1 df 1 df 2 20 1 19 : 0.3958 2.5265 2 1 HA: 2 2 1 2 T est S tatistic: Fobs P value: s1 s 2 2 3.0919 0.7242 4.2694 1 P 2 m in P F 19,19 0.7242 , P F 19,19 0.7242 2 m in 0.7557 , 0 .2443 2(0.2443) 0.4886 1 100% 1 2 C I for 2 2 : 0.7242 2.5265 , 0.7242 0.3958 0.2866 , 1.8297 Small Sample Test to Compare Two Medians – Non-Normal Populations • Two Independent Samples (Parallel Groups) • Procedure (Wilcoxon Rank-Sum Test): Null hypothesis: Population Medians are equal H0: M1 = M2 Rank measurements across samples from smallest (1) to largest (n1+n2). Ties take average ranks. Obtain the rank sum for group with smallest sample size (T ) 1-sided tests: Conclude HA: M1 > M2 if T > TU Conclude: HA: M1 < M2 if T < TL 2-sided tests: Conclude HA: M1 M2 if T > TU or T < TL Values of TL and TU are given in tables for various sample sizes and significance levels (Some tables use T=Rank sum for larger Group). This test gives equivalent conclusions as Mann-Whitney U-test Rank-Sum Test: Normal Approximation • Under the null hypothesis of no difference in the two groups (let T be rank sum for group 1): T n1 ( N 1) 2 n1 n 2 ( N 1) T 12 N n1 n 2 • A z-statistic can be computed and P-value (approximate) can be obtained from Z-distribution z obs T T T T n1 ( N 1) / 2 n1 n 2 ( N 1) / 12 Note: When there are many ties in ranks, a more complex formula for T is often used, with little effect unless there are many ties. WNBA/NBA BMI Data – Wilcoxon Rank-Sum Test Player Giannis Antetokounmpo Joel Anthony Alex Len Erik Murphy Ersan Ilyasova Kevin Garnett Chauncey Billups Juwan Howard Vladimir Radmanovic Tiago Splitter Jarvis Varnado Alexey Shved Jermaine O`Neal Michael Kidd-Gilchrist Metta World Peace Tim Hardaway Jr. Greivis Vasquez Daniel Gibson Terrence Ross Chris Kaman Tamika Catchings Courtney Clements Allie Quigley Quanitra Hollingsworth Katie Smith Tayler Hill Allison Hightower Kara Braxton Eshaya Murphy Michelle Campbell Briann January Jasmine James Kelsey Bone Jia Perkins Ebony Hoffman Shavonte Zellous Matee Ajavon Karima Christmas Erika de Souza Jayne Appel id Gender 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Height 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 81 81 85 82 82 83 75 81 82 83 81 78 83 79 79 78 78 74 79 84 73 72 70 77 71 70 70 78 71 74 68 69 76 68 74 70 68 72 77 76 Weight BMI Rank 205 21.9654 7 245 26.25133 36 255 24.81176 27 230 24.0467 16 235 24.56945 24.5 253 25.81783 31 202 25.24551 28 250 26.78708 38 235 24.56945 24.5 240 24.49122 23 230 24.64411 26 190 21.95431 6 255 26.02192 34 232 26.13299 35 260 29.28697 40 205 23.68754 15 211 24.38083 20 200 25.67568 30 197 22.19051 9 265 26.40235 37 167 22.03059 8 155 21.01948 4 140 20.08571 2 203 24.06966 17 175 24.40488 21 145 20.80306 3 139 19.94224 1 225 25.99852 33 164 22.87086 12 183 23.49324 13 144 21.89273 5 175 25.84016 32 200 24.34211 19 155 23.5651 14 215 27.60135 39 155 22.23776 10 160 24.32526 18 180 24.40972 22 190 22.52825 11 210 25.55921 29 T 7 36 ... 9 37 507 T z obs 20(41 1) 410 2 T T T n1 n 2 20 (20)(20)(41) T 507 410 1366.667 12 97 N 40 2.6239 36.9685 1366.667 P value 2 P Z 2.6239 .0087 R uses a different algorithm for a sligh tly different P -value. N ote: T he statistic R com putes is W T n1 n1 1 2 T his is difference betw een T and the m inim um it could be. W T n1 n1 1 2 507 20(21) 2 507 210 297 R Program and Output bmi1 <read.csv("http://www.stat.ufl.edu/~winner/data/wnba_nba_bmi.csv",header=T) attach(bmi1); names(bmi1) tapply(BMI,Gender,mean) # Obtain mean BMI by Gender tapply(BMI,Gender,var) # Obtain variance of BMI by Gender tapply(BMI,Gender,length) # Obtain sample size of BMI by Gender t.test(BMI~Gender,var.equal=T) # t.test(BMI~Gender) # var.test(BMI~Gender) # wilcox.test(BMI~Gender) # ################################# > tapply(BMI,Gender,mean) 1 2 24.94665 23.35099 > tapply(BMI,Gender,var) 1 2 3.091871 4.269420 > tapply(BMI,Gender,length) 1 2 20 20 t-test with Equal Variances t-test with Unequal Variances F-test for Equal Variances Wilcoxon Rank-Sum Test # Obtain mean BMI by Gender # Obtain variance of BMI by Gender # Obtain sample size of BMI by Gender R Output (Continued) > t.test(BMI~Gender,var.equal=T) # t-test with Equal Variances Two Sample t-test data: BMI by Gender t = 2.6301, df = 38, p-value = 0.01226 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.3674868 2.8238189 sample estimates: mean in group 1 mean in group 2 24.94665 23.35099 > t.test(BMI~Gender) # t-test with Unequal Variances Welch Two Sample t-test data: BMI by Gender t = 2.6301, df = 37.052, p-value = 0.01236 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.3664539 2.8248518 sample estimates: mean in group 1 mean in group 2 24.94665 23.35099 R Output (Continued) > var.test(BMI~Gender) # F-test for Equal Variances F test to compare two variances data: BMI by Gender F = 0.7242, num df = 19, denom df = 19, p-value = 0.4885 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.2866432 1.8296302 sample estimates: ratio of variances 0.7241899 > wilcox.test(BMI~Gender) Wilcoxon rank sum test with continuity correction data: BMI by Gender W = 297, p-value = 0.009042 alternative hypothesis: true location shift is not equal to 0 Warning message: In wilcox.test.default(x = c(21.96540162, 26.25133364, 24.81176471, cannot compute exact p-value with ties : Paired t-test S etting: n m atched pairs, each under 1 of 2 com peting conditions N ote: In m any experim ents, it is the sam e S ubject under each condition D ata: d j Y1 j Y 2 j j 1, ..., n n d n dj j 1 sd 2 n H 0 : d 1 2 0 T S : t obs d s dj d j 1 D ifference betw een m easu rem ent unde r C onditions 1 and 2 2 n 1 H A : d 1 2 0 w ith df n 1 2 d n P value: 1 100% P 2 P t n 1 t obs C I for d 1 2 : d t ; n 1 2 2 sd n Example: English Premier League Football - 2012 • Interested in Determining if there is a home field effect League has 20 teams, all play all 19 opponents Home and Away (190 “pairs” of teams, each playing once on each team’s home field). No overtime. We are treating each “pair of teams” as a unit Y1 is the Total Score for the Home Teams, Y2 is for Away • Note: d represents combined Home Goals – Combined Away Goals for the Pair of teams (“units”) • No home effect should mean d = 0 • Programming Note: In Independent Sample t-test, we had a Variable for Treatment/Group and another variable for Response (Y). Here we have Y1 and Y2 as separate variables, with each row as a unit Portion of Data File (.csv). Note n =190 Team1 Arsenal Arsenal Arsenal Arsenal Arsenal Arsenal Arsenal Arsenal Arsenal Arsenal Arsenal Arsenal Arsenal Arsenal Arsenal Arsenal Arsenal Arsenal Arsenal Aston Villa Team2 Home Aston Villa Chelsea Everton Fulham Liverpool Manchester City Manchester United Newcastle United Norwich City Queens Park Rangers Reading Southampton Stoke City Sunderland Swansea City Tottenham Hotspur West Bromwich Albion West Ham United Wigan Athletic Chelsea Away 2 3 1 3 2 1 3 7 4 1 6 7 1 0 0 7 3 6 4 9 1 3 1 4 4 3 2 4 1 1 6 2 0 1 4 3 2 4 2 2 Paired t-test for EPL 2012 Home vs Away Goals H 0 : d 1 2 0 T S : t obs d s w ith H A : d 1 2 0 df n 1 2 d n D ata (From E X C E L S preadsheet): n 190 t obs 0.6368 d 0.636 8 s d 4.3912 2 4.1888 4.3912 190 P value : 2 P t 189 4.1888 95:% C I for d : 2(.00002) .00004 0.6368 1.9726 0.3369 , 0.9367 4 .3912 190 0.6368 0.2999 R Program / Output epl.2012 <read.csv("http://www.stat.ufl.edu/~winner/data/epl_2012_home.csv", header=T) attach(epl.2012); names(epl.2012) t.test(Home,Away,paired=T) wilcox.test(Home,Away,paired=T) ####################### > t.test(Home,Away,paired=T) Paired t-test data: Home and Away t = 4.1891, df = 189, p-value = 4.294e-05 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.3369575 0.9367267 sample estimates: mean of the differences 0.6368421 Small-Sample Test For Nonnormal Data • Paired Samples (Crossover Design) • Procedure (Wilcoxon Signed-Rank Test) Compute Differences di (as in the paired t-test) and obtain their absolute values (ignoring 0s). n= number of non-zero differences Rank the observations by |di| (smallest=1), averaging ranks for ties Compute T+ and T- , the rank sums for the positive and negative differences, respectively 1-sided tests:Conclude HA: M1 > M2 if T=T- T0 2-sided tests:Conclude HA: M1 M2 if T=min(T+ , T- ) T0 Values of T0 are given in various tables for various sample sizes and significance levels. Some tables give the upper tail cut-off T0 values P-values are printed by statistical software packages. Signed-Rank Test: Normal Approximation Under the null hypothesis of no difference in the 2 groups: Let T = T+ T n ( n 1) 4 T n ( n 1)(2 n 1) 24 Z-Statistic computed and approximate P-value can be obtained from: z obs T T T T n ( n 1) / 4 n ( n 1)(2 n 1) / 24 When there are ties (many common ds) as in soccer data, T is reduced and is of form: T 1 1 n n 1 2 n 1 24 2 g t 1 t j t j 1 t j 1 w here: g # of distinct levels of d and t j is the # of ties at level j EPL Home Field Advantage Diff (d) sum |Diff| Count (t) T+ -6 1 -4 3 -3 7 -2 17 -1 27 1 30 2 33 3 16 4 10 5 6 7 1 151 0 0 0 0 0 29 82.5 119 137 146.5 151 T+ E(T) sigma^2_T sigma_T Z p-value 7896.5 5738 283006 531.98 4.0575 0.0000496 Count(t) LowRank HighRank MeanRank t(t-1)*(t+1) 1 57 1 57 29 185136 2 50 58 107 82.5 124950 3 23 108 130 119 12144 4 13 131 143 137 2184 5 6 144 149 146.5 210 6 1 150 150 150 0 7 1 151 151 151 0 324624 • Zero differences have been removed • The Differences and their Counts are at top left • Absolute differences and their counts and average ranks are at bottom • T+ is the sum of the products of the counts and the T+ columns (e.g. There are 30 cases with d=+1, each getting rank=29) • The Z is large and P-value is small • R Labels T+ as V R Output > wilcox.test(Home,Away,paired=T) Wilcoxon signed rank test with continuity correction data: Home and Away V = 7896.5, p-value = 4.981e-05 alternative hypothesis: true location shift is not equal to 0 Test for Association for Categorical Variables Counts Col 1 Col 2 … Col c Total Row 1 n11 n12 … n1c n1• Row 2 n21 n22 … n2c n2• … … … … … … Row r nr1 nr2 … nrc nr• Total n•1 n•2 … n•c n•• ^ E xpected C ell C ounts: n ij ni n j i 1, ..., r ; j 1, ..., c n r P earson C hi-S quare S tatistic: X P 2 c i 1 ^ n n ij ij j 1 ^ L ikelihood-R atio C hi-S quare S tatistic: X df r 1 c 1 n ij r 2 LR 2 2 i 1 n ij n ij ln ^ j 1 n ij c df r 1 c 1 R eject the null hypothesis of no associa tion betw een the row and colum n variable s if: X 2 2 ; r 1 c 1 r 1 c 1 X P value: P P 2 2 Example: Crop Circles by Country and Field Type ^ E xpected C ell C ounts: Observed Country other wheat Total England 108 323 Germany 47 90 Italy 56 46 USA 27 17 Canada 32 11 Holland 10 24 Switzerland 6 23 Belgium 4 18 Czech Republic 7 14 Total 297 566 Percent 34.41483 65.58517 ni n j i 1, ..., 9; j 1, 2 n ^ 431 137 102 44 43 34 29 22 21 863 100 Both tests are highly significant. Expected Country wheat0 wheat1 Total England 148.3279 282.6721 Germany 47.14832 89.85168 Italy 35.10313 66.89687 USA 15.14253 28.85747 Canada 14.79838 28.20162 Holland 11.70104 22.29896 Switzerland 9.980301 19.0197 Belgium 7.571263 14.42874 Czech Republic 7.227115 13.77289 Total 297 566 n ij 431 137 102 44 43 34 29 22 21 863 For E ngland/other (i= 1 , j= 1): n 11 n1 n 1 431(297 ) n P earson C hi-S quare S tatistic: X P 2 148.33 w ith 863 r c i 1 j 1 ^ n ij n ij C ontribution from cell w ith E ngland/othe r: 2 ^ n ij ^ n11 n 11 ^ 2 108 148.33 r Likelihood-R atio C hi-S quare S tatistic: X L R 2 2 i 1 10.97 n ij n ln ij ^ j 1 n ij c n 2 n11 ln ^ 11 n 11 Pearson Chi-square Country wheat0 wheat1 Total England 10.9645 5.753457 16.71796 Germany 0.000467 0.000245 0.000711 Italy 12.43989 6.527648 18.96754 USA 9.285088 4.872211 14.1573 Canada 19.99515 10.49216 30.48731 Holland 0.24729 0.129762 0.377051 Switzerland 1.587407 0.832968 2.420375 Belgium 1.684517 0.883925 2.568442 Czech Republic 0.007137 0.003745 0.010882 Total 56.21145 29.49612 85.70757 85.70757 X^2(obs) 15.50731 X^2(.05,8) 2 148.33 n 11 C ontribution from cell w ith E ngland/o ther: n11 108 108 2(108) ln 68.54 148.33 Likelihood-Ratio Chi-Square Country wheat0 wheat1 Total England -68.5356 86.15369 17.61812 Germany -0.29617 0.296884 0.000712 Italy 52.31088 -34.455 17.85589 USA 31.22981 -17.9913 13.23852 Canada 49.35797 -20.7127 28.64532 Holland -3.14186 3.528668 0.386811 Switzerland -6.10625 8.740874 2.634628 Belgium -5.10452 7.961397 2.856873 Czech Republic -0.44702 0.457954 0.010938 Total 49.26727 33.98053 83.2478 R Program – Uses the vcd Package cc <- read.csv("http://www.stat.ufl.edu/~winner/data/crop_circle",header=T) attach(cc); names(cc) (wheat.country <- table(Country,wheat)) chisq.test(wheat.country) install.packages("vcd") library(vcd) assocstats(wheat.country) barplot(wheat.country, col=c("blue","green","pink","purple","red", "yellow","orange","cornflowerblue","beige"), main="Wheat by Country",xlab="Wheat",ylab="Count") labs <- rownames(wheat.country) legend(locator(1),labs,fill=c("blue","green","pink","purple","red", "yellow","orange","cornflowerblue","beige")) barplot(wheat.country,beside=T, col=c("blue","green","pink","purple","red", "yellow","orange","cornflowerblue","beige"), main="Wheat by Country",xlab="Wheat",ylab="Count") labs <- rownames(wheat.country) legend(locator(1),labs,fill=c("blue","green","pink","purple","red", "yellow","orange","cornflowerblue","beige")) R Output > (wheat.country <- table(Country,wheat)) wheat Country 0 1 Belgium 4 18 Canada 32 11 Czech 7 14 England 108 323 Germany 47 90 Holland 10 24 Italy 56 46 Swiss 6 23 USA 27 17 ################################################## > assocstats(wheat.country) X^2 df P(> X^2) Likelihood Ratio 83.248 8 1.0880e-14 Pearson 85.708 8 3.4417e-15 Phi-Coefficient : 0.315 Contingency Coeff.: 0.301 Cramer's V : 0.315