STATISTICS 479 Assignment #4 (40 points)

advertisement
Fall 2013
STATISTICS 479
Assignment #4 (40 points)
1. Use the SAS data set fueldat created in Problem #2 of Assignment#3 in a proc tabulate step
to print a tabulation giving the sample size, sample mean, sample standard deviation, the standard
error of the mean and the maximum of the variables Roads for subgroups of observations defined by
the combination of values of IncomGrp and LicGrp. The row analysis consists of combinations of
IncomGrp and LicGrp and the statistics for Roads must appear as the columns. Print the sample
size without decimals, the sample mean with four decimal places, and the other two statistics with
two decimal places each. Also, use appropriate text strings to label all statistics keyword headings
(i.e., replace std with the string ’Standard Deviation’ etc.) (Follow SAS Example B13 in the text
book).
2. Organizations interested in making sure that accused persons have a trial of their peers often
compare the distributions of jurors by age, education, and other socioeconomic variables. One
such study provided the following information on the ethnicity of 2500 jurors in a certain U.S.
county:
Ethnicity
No. of Jurors
White
2030
Black
313
Native American
10
Asian
57
Other
90
The percentages of the population in the county belonging to each of these ethnicities are 80.29%,
12.06%, 0.79%, 2.92%, and 3.94%, respectively. Is there significant evidence of a difference between
ethnicity distribution of jurors and the ethnicity distribution of the whole county? Use a SAS
program to perform a chi-square goodness-of-fit at α = .05 to answer this question.
3. Suppose that a graduate student is studying factors affecting the distribution of barrel cacti. She
sets up a series of randomly dispersed quadrats (sampling plots) of appropriate size and counts
the numbers of cacti per quadrate. Below xi denotes the number of plants per quadrate and ni
denotes the number of quadrats with xi plants:
xi
ni
0
18
1
38
2
49
3
21
4
12
5
5
≥6
1
Write a SAS program to perform a chi-square goodness-of-fit at α = .05 to test the hypothesis
that the observed counts were drawn from a Poisson probability distribution. Hint: For this data
the Poisson mean λ is estimated as
P
λ̂ =
ni xi
= 279/144 ≈ 1.9
n
To fit the data to a Poisson distribution the probabilities for a Poisson distribution with mean=1.9
are needed. Calculate these (and fill-out the following table) using the formula for the Poisson
pmf, extract from a table in a text book, or use one of many Poisson calculators available on the
internet:
e.g.,http://easycalculation.com/statistics/poisson-distribution.php):
xi
P (X = xi )
0
1
2
1
3
4
5
≥6
4. A study of on relationship between men’s marital status and level of their jobs used data on all
8235 male managers and professionals employed by a large manufaturing firm. Each man’s job
has a level set by the company that reflects the value of that particular job to the company. The
category variable ‘Job Grade’ was created by grouping the job levels into 4 quarters; thus Job
Grade 1 has the lowest quarter of job levels, Job Grade 2 has the next lowest quarter of job levels,
etc. Use the data below to compute a chi-square test of independence using proc freq. Use
nominal measures of association, the contingency coefficient C and Cramer’s V, to comment on
the strength of association if present.
Job
Grade
1
2
3
4
Single
58
222
50
7
Marital Status
Married
Divorced
874
3927
2396
533
Widowed
15
70
34
7
8
20
10
4
5. Use the SAS data set fueldat created in Problem #2 of Assignment#3 in a proc freq step to
do the following:
(a) Obtain two-way frequency tables (in the cross-tabulation format) for the variable IncomGrp
with LicGrp. Compute chi-square statistics, cell χ2 , and cell expected values but no column,
row, or cell percentages,
(b) Use the chi-square statistic to test hypotheses of independence between pairs of variables
considered in part (a) (i.e., per capita income and number of licensed drivers ) and state
your results.
(c) Use the values of Gamma, Kendall’s Tau-b, and Spearman correlation coefficient to comment
on the association between IncomGrp and LicGrp.
Notes:
• Use the statrt-up code available in assign4.prob5 link.
• The variable LicGrp may be considered the dependent variable and and IncomGrp as the
independent variable for this analysis.
• Label variables and use formats for the levels of the category variables.
• Follow similar analysis in Example B9 in the text book.
6. Use the SAS data set fueldat created in Problem #2 of Assignment#3 in a proc univariate
step to construct histograms and normal probability plots of the Roads and Numlic variables,
respectively. (Just follow the code used in Example b7c used in class.) Describe the shapes of the
two distributions as can be determined using these two graphs.
Due Thursday 17, October 2013
2
Download