Fall 2013 STATISTICS 479 Assignment #4 (40 points) 1. Use the SAS data set fueldat created in Problem #2 of Assignment#3 in a proc tabulate step to print a tabulation giving the sample size, sample mean, sample standard deviation, the standard error of the mean and the maximum of the variables Roads for subgroups of observations defined by the combination of values of IncomGrp and LicGrp. The row analysis consists of combinations of IncomGrp and LicGrp and the statistics for Roads must appear as the columns. Print the sample size without decimals, the sample mean with four decimal places, and the other two statistics with two decimal places each. Also, use appropriate text strings to label all statistics keyword headings (i.e., replace std with the string ’Standard Deviation’ etc.) (Follow SAS Example B13 in the text book). 2. Organizations interested in making sure that accused persons have a trial of their peers often compare the distributions of jurors by age, education, and other socioeconomic variables. One such study provided the following information on the ethnicity of 2500 jurors in a certain U.S. county: Ethnicity No. of Jurors White 2030 Black 313 Native American 10 Asian 57 Other 90 The percentages of the population in the county belonging to each of these ethnicities are 80.29%, 12.06%, 0.79%, 2.92%, and 3.94%, respectively. Is there significant evidence of a difference between ethnicity distribution of jurors and the ethnicity distribution of the whole county? Use a SAS program to perform a chi-square goodness-of-fit at α = .05 to answer this question. 3. Suppose that a graduate student is studying factors affecting the distribution of barrel cacti. She sets up a series of randomly dispersed quadrats (sampling plots) of appropriate size and counts the numbers of cacti per quadrate. Below xi denotes the number of plants per quadrate and ni denotes the number of quadrats with xi plants: xi ni 0 18 1 38 2 49 3 21 4 12 5 5 ≥6 1 Write a SAS program to perform a chi-square goodness-of-fit at α = .05 to test the hypothesis that the observed counts were drawn from a Poisson probability distribution. Hint: For this data the Poisson mean λ is estimated as P λ̂ = ni xi = 279/144 ≈ 1.9 n To fit the data to a Poisson distribution the probabilities for a Poisson distribution with mean=1.9 are needed. Calculate these (and fill-out the following table) using the formula for the Poisson pmf, extract from a table in a text book, or use one of many Poisson calculators available on the internet: e.g.,http://easycalculation.com/statistics/poisson-distribution.php): xi P (X = xi ) 0 1 2 1 3 4 5 ≥6 4. A study of on relationship between men’s marital status and level of their jobs used data on all 8235 male managers and professionals employed by a large manufaturing firm. Each man’s job has a level set by the company that reflects the value of that particular job to the company. The category variable ‘Job Grade’ was created by grouping the job levels into 4 quarters; thus Job Grade 1 has the lowest quarter of job levels, Job Grade 2 has the next lowest quarter of job levels, etc. Use the data below to compute a chi-square test of independence using proc freq. Use nominal measures of association, the contingency coefficient C and Cramer’s V, to comment on the strength of association if present. Job Grade 1 2 3 4 Single 58 222 50 7 Marital Status Married Divorced 874 3927 2396 533 Widowed 15 70 34 7 8 20 10 4 5. Use the SAS data set fueldat created in Problem #2 of Assignment#3 in a proc freq step to do the following: (a) Obtain two-way frequency tables (in the cross-tabulation format) for the variable IncomGrp with LicGrp. Compute chi-square statistics, cell χ2 , and cell expected values but no column, row, or cell percentages, (b) Use the chi-square statistic to test hypotheses of independence between pairs of variables considered in part (a) (i.e., per capita income and number of licensed drivers ) and state your results. (c) Use the values of Gamma, Kendall’s Tau-b, and Spearman correlation coefficient to comment on the association between IncomGrp and LicGrp. Notes: • Use the statrt-up code available in assign4.prob5 link. • The variable LicGrp may be considered the dependent variable and and IncomGrp as the independent variable for this analysis. • Label variables and use formats for the levels of the category variables. • Follow similar analysis in Example B9 in the text book. 6. Use the SAS data set fueldat created in Problem #2 of Assignment#3 in a proc univariate step to construct histograms and normal probability plots of the Roads and Numlic variables, respectively. (Just follow the code used in Example b7c used in class.) Describe the shapes of the two distributions as can be determined using these two graphs. Due Thursday 17, October 2013 2