Statistics 479 Assignment #4 Answer Key Fall 2013 1. libname mylib "U:\Documents\Stat479\"; data fuelnew; set mylib.fueldat; if Percent=<54 then LicGrp=1; else if 54<Percent=<58 then LicGrp=2; else LicGrp=3; label LicGrp="No. of Drivers"; run; proc format; value ig 1 = 'Low Income' 2 = 'Middle Income' 3 = 'High Income'; value lg 1='below 54%' 2='54 to 58%' 3='above 58%' ; run; ods rtf file="U:\Documents\Classwork\stat479\F13\as4n1_out.rtf"; proc tabulate data=fuelnew; var Roads; class IncomGrp LicGrp; table IncomGrp*LicGrp,Roads*(n="Sample Size"*f=3. mean="Sample Mean"*f=8.4 std="Sample Standard Deviation"*f=8.2 stderr="Standard Error of the Mean"*f=8.2 max="Largest"*f=5.); title "Analysis of Miles of Primary Highways by Per Capita Income Level and Number of Licensed Driver"; format IncomGrp ig. LicGrp lg.; run; ods rtf close; Miles of Primary Highways(in thsnds.) Sample Standard Sample Sample Standard Error of Size Mean Deviation the Mean Largest Per capita Income No. of Drivers Low Income Middle Income High Income below 54% 7 4.4223 1.88 0.71 7 54 to 58% 5 4.9550 1.04 0.47 7 above 58% 1 3.2740 . . 3 below 54% 2 5.3125 0.89 0.63 6 54 to 58% 7 7.0839 5.78 2.18 18 above 58% 9 5.7708 2.73 0.91 10 below 54% 5 7.8862 5.39 2.41 14 54 to 58% 6 4.2115 2.87 1.17 9 above 58% 6 5.1810 3.42 1.40 10 2. data jurors; input Ethnicity Count @@; datalines; 1 2030 2 313 3 10 4 57 5 90 ; proc format; value ef 1='White' 2='Black' 3='Native American' 4='Asian' 5= Other; run; ods rtf file="U:\Documents\Classwork\stat479\F13\as4n2_out.rtf"; proc freq data=jurors; weight Count; tables Ethnicity/nocum testp=(80.29 12.06 0.79 2.92 3.94); format Ethnicity ef.; title 'Ethnicity Distribution of Jurors'; run; ods rtf close; Test Ethnicity Frequency Percent Percent White 2030 81.20 80.29 Black 313 12.52 12.06 Native American 10 0.40 0.79 Asian 57 2.28 2.92 Other 90 3.60 3.94 Chi-Square Test for Specified Proportions Chi-Square DF Pr > ChiSq 9.7501 4 0.0449 The hypotheses tested is = H 0 : p1 .8029, = p2 .1206, = p3 .0079, = p4 .0292, = p5 .0394 vs. H a : at least one or more inequality, where p1 , p2 , p3 , p4 , p5 are multinomial probabilities that a randomly selected juror belongs to one of the five ethnic classes. We reject the null hypothesis at α = .05 , since the p-value <.05. Thus the data provides evidence that the ethnic distribution of the jurors is different from the ethnic distribution of people in the whole county. 3. The problem set-up is the same as in the class example. We want to test the hypothesis that the frequencies for the counts of number of quadrats is the same as those obtained from a Poisson distribution with a mean parameter λ = 1.9 which was estimated from the data. The probabilities calculated are: 0.15 0.284 0.27 0.171 0.081 0.031 0.013 data cactus; input Plants $ Quadrats @@; datalines; 0 18 1 38 2 49 3 21 4 12 5 5 >=6 1 ; ods rtf file="U:\Documents\Classwork\stat479\F13\as4n3_out.rtf"; proc freq data=cactus; weight Quadrats; tables Plants/nocum expected testp=(15.0 28.4 27.0 17.1 8.1 3.1 1.3); title 'Barrel Cacti Distribution in Quadrats'; run; ods rtf close; Test Plants Frequency Percent Percent 0 18 12.50 15.00 1 38 26.39 28.40 2 49 34.03 27.00 3 21 14.58 17.10 4 12 8.33 8.10 5 5 3.47 3.10 >=6 1 0.69 1.30 Chi-Square Test for Specified Proportions Chi-Square 4.4528 6 DF Pr > ChiSq 0.6156 WARNING: 29% of the cells have expected counts less than 5. Chi-Square may not be a valid test. Since the p-value of .6156 is not smaller than .05 we fail to reject the null hypothesis that the observed frequencies were sampled from a Poisson distribution. 4. data company; input Grade Status Count @@; label Grade="Job Grade" Status="Marital Status"; datalines; 1 1 58 1 2 874 1 3 15 1 4 8 2 1 222 2 2 3927 2 3 70 2 4 20 3 1 50 3 2 2396 3 3 34 3 4 10 4 1 7 4 2 533 4 3 7 4 4 4 ; proc format; value ms 1 = 'Single' 2 = 'Married' 3 = 'Divorced' 4 = 'Widowed' ; ods rtf file="U:\Documents\Classwork\stat479\F13\as4n4_out.rtf"; proc freq; weight Count; tables Grade*Status/chisq expected cellchi2 nocol nopercent norow; title "Dependency of Job Grade on Marital Status"; format Status ms.; run; ods rtf close; See output and discussion on the following page. 4.(Continued) Table of Grade by Status Grade(Job Grade) Status(Marital Status) Frequency Expected Cell Chi-Square Single Married Divorced Widowed Total 1 58 39.118 9.1148 874 897.27 0.6033 15 14.626 0.0096 9 4.9913 3.2196 956 2 222 173.45 13.589 3927 3978.6 0.6683 70 64.851 0.4088 20 22.132 0.2053 4239 3 50 101.89 26.423 2396 2337 1.4885 34 38.094 0.4399 10 13 0.6924 2490 4 7 22.546 10.719 533 517.15 0.4859 7 8.4296 0.2424 4 2.8768 0.4386 551 337 7730 126 43 8236 Total Statistic DF Value Prob Chi-Square 9 68.7484 <.0001 Likelihood Ratio Chi-Square 9 76.0113 <.0001 Mantel-Haenszel Chi-Square 1 19.5564 <.0001 Phi Coefficient 0.0914 Contingency Coefficient 0.0910 Cramer's V 0.0527 The p-value for the χ 2 statistic is very small indicating that the null hypothesis that Job Grade is independent of Marital Status is to be rejected, that is, an association exists between these two variables. Phi, C, and V are all measures of the strength of association between nominal variables. However, although the χ 2 statistic is large, the Phi, C, and V, are all very small, may be showing that the association is very weak. However, all these three statistics have sample size n as the denominator or part of the denominator. Knowing that Job Grade is an ordinal variable more powerful tests of association could be used, instead of these 3. 5. libname mylib "U:\Documents\Stat479\"; data fuelnew; set mylib.fueldat; if Percent=<54 then LicGrp=1; else if 54<Percent=<58 then LicGrp=2; else LicGrp=3; label LicGrp="No. of Drivers"; run; proc format; value ig 1 = 'Low Income' 2 = 'Middle Income' 3 = 'High Income'; value lg 1='below 54%' 2='54 to 58%' 3='above 58%' ; run; ods rtf file="U:\Documents\Classwork\stat479\F13\as4n5_out.rtf"; proc freq data=fuelnew; tables IncomGrp*LicGrp/chisq expected cellchi2 nocol nopercent norow measures; test gamma kentb scorr; format IncomGrp ig. LicGrp lg.; title 'Analysis of Association between Per Capita Income and Number of Licensed Driver'; run; ods rtf close; Table of Incomgrp by LicGrp Incomgrp(Per capita Income) Frequency Expected Cell Chi-Square LicGrp(No. of Drivers) below 54% 54 to 58% above 58% Total Low Income 7 3.7917 2.7147 5 4.875 0.0032 1 4.3333 2.5641 13 Middle Income 2 5.25 2.0119 7 6.75 0.0093 9 6 1.5 18 High Income 5 4.9583 0.0004 6 6.375 0.0221 6 5.6667 0.0196 17 14 18 16 48 Total Statistic DF Value Prob Chi-Square 4 8.8452 0.0651 Likelihood Ratio Chi-Square 4 9.8932 0.0423 Mantel-Haenszel Chi-Square 1 2.4728 0.1158 Phi Coefficient 0.4293 Contingency Coefficient 0.3945 Cramer's V 0.3035 WARNING: 44% of the cells have expected counts less than 5. Chi-Square may not be a valid test. Statistic Value ASE Gamma 0.2814 0.1841 Kendall's Tau-b 0.1941 0.1295 Stuart's Tau-c 0.1927 0.1294 Somers' D C|R 0.1945 0.1291 Somers' D R|C 0.1937 0.1299 Pearson Correlation 0.2294 0.1386 Spearman Correlation 0.2162 0.1456 Lambda Asymmetric C|R 0.1333 0.1642 Lambda Asymmetric R|C 0.1667 0.0913 Lambda Symmetric 0.1500 0.1108 Uncertainty Coefficient C|R 0.0943 0.0526 Uncertainty Coefficient R|C 0.0946 0.0527 Uncertainty Coefficient Symmetric 0.0944 0.0526 Gamma Gamma 0.2814 ASE 0.1841 95% Lower Conf Limit -0.0795 95% Upper Conf Limit 0.6422 Test of H0: Gamma = 0 ASE under H0 0.1890 Z 1.4891 One-sided Pr > Z 0.0682 Two-sided Pr > |Z| 0.1365 Kendall's Tau-b Tau-b 0.1941 ASE 0.1295 95% Lower Conf Limit -0.0598 95% Upper Conf Limit 0.4480 Test of H0: Tau-b = 0 ASE under H0 0.1303 Z 1.4891 One-sided Pr > Z 0.0682 Two-sided Pr > |Z| 0.1365 Spearman Correlation Coefficient Correlation 0.2162 ASE 0.1456 95% Lower Conf Limit -0.0693 95% Upper Conf Limit 0.5016 Test of H0: Correlation = 0 ASE under H0 0.1463 Z 1.4776 One-sided Pr > Z 0.0698 Two-sided Pr > |Z| 0.1395 Sample Size = 48 6. libname mylib "U:\Documents\Stat479\"; ods pdf file="U:\Documents\Classwork\stat479\F13\as4n6_out.pdf"; proc univariate data=mylib.fueldat noprint; var Roads Numlic; histogram Roads Numlic; probplot Roads/normal(mu=est sigma=est); probplot Numlic/normal(mu=est sigma=est); title 'Graphics to Distribution of Roads and NumLic'; run; ods pdf close; See attached graphical output from SAS: The interpretations from these are that: • Road distribution appears to be right-skewed as shown clearly by the histogram and the bowl or cup-shape of the normal probability plot. • Numlic distribution also appears to be right-skewed as shown clearly by the histogram and the bowl or cup-shape of the normal probability plot.