STATISTICS 479 Assignment #5 (40 points)

advertisement
Fall 2013
STATISTICS 479
Assignment #5 (40 points)
1. Write five proc steps using statistical graphics (sg) procedures in the same SAS program, to access
the fuel SAS dataset and produce the following graphs:
(a) Scatterplot of Fuel variable against Roads variable labeling each point by the two-letter state postal
code. The points must be filled circles of dark cyan color of size 5 pixels.
(b) Histogram of Income varaible with bins staring at 3 with bin width of 0.5. The variable on the
vertical axis must be the frequencies (counts). Change the fill color to a suitable light color such
as cornflowerblue or lightsalmon). Overlay the histogram with a normal density fit with estimated
parameters.
(c) Simple linear regression fit plot of Fuel variable against Income variable showing confidence and
prediction bands.
(d) Scatterplot matrix of the four variables Fuel, Roads, Pop, and Numlic, and identify the each point
i.e., state) by its fuel tax group (use the TaxGrp varaible).
(e) Obtain a dot plot of the Roads variable to display miles of primary highway per state. (use proc
sgplot here.)
2. An experiment was performed to determine whether two forms of iron (Fe2+ and Fe3+ ) are retained
differently. (If one form of iron were retained especially well, it would be the better dietary supplement.)
The investigators divided 108 mice randomly into 6 groups of 18 each; 3 groups were given Fe2+ in three
different concentrations, 10.2, 1.2, and .3 millimolar, and 3 groups were given Fe3+ at the same three
concentrations. The mice were given the iron orally; the iron was radioactively labeled so that a counter
could be used to measure the initial amount given. At a later time, another count was taken for each
mouse, and the percentage of iron retained was calculated. The data are on the next page.
Construct side-by-side box plots using proc sgplot as in Example C16 for the six treatments. Place the
plots on the x-axis in the order High, Medium, and Low doses for each of the two forms of iron, respectively.
The data, given below are also in the file iron.txt
(a) Compare and comment on features such as shape, location, dispersion and outliers of the 6 iron
retention distributions.
(b) Comment on any observed trend in the median % iron retention over the levels and the forms of
iron. What is the observed trend in dispersion (as measured by, say, IQR)? That is, compare the
distributions of % iron retention across the six treatments.
(c) Statistical analysis of this data (for e.g., using a one way anova model to compare means) may be complicated by failure of assumptions such as homogeneity of variance and/or non-normal distributions.
Do the box plots show evidence of these problems? Explain. If there is reason to believe that the
assumptions fail based on the plots, a possible explanation is that each distribution is related to the
median level of % iron retention in some way. Discuss whether there appears to be such a relationship.
3. Use the fueldat data set and proc sgplot to obtain horizontal bar charts showing the means of the Fuel
variable for each of the three groups of states with number of licensed drivers groups (below 54%, between
54 to 58%, and above 58%) defined by the LicGrp variable. Subdivide each bar into the three Income groups
, defined by the IncomGrp variable. Use the keylegend statement to position the legend in the top right
corner inside the outline. To make room for the legend include the statement yaxis offsetmin=.2; in your
proc step. (Note: In order to create the LicGrp variable use the piece of SAS code named assign5.prob3
to begin this program.)
1
Fe3+
10.2
.71
1.66
2.01
2.16
2.42
2.42
2.56
2.60
3.31
3.64
3.74
3.74
4.39
4.50
5.07
5.26
8.15
8.24
1.2
2.20
2.93
3.08
3.49
4.11
4.95
5.16
5.54
5.68
6.25
7.25
7.90
8.85
11.96
15.54
15.89
18.3
18.59
.3
2.25
3.93
5.08
5.82
5.84
6.89
8.50
8.56
9.44
10.52
13.46
13.57
14.76
16.41
16.96
17.56
22.82
29.13
Fe2+
10.2
2.20
2.69
3.54
3.75
3.83
4.08
4.27
4.53
5.32
6.18
6.22
6.33
6.97
6.97
7.52
8.36
11.65
12.45
1.2
4.04
4.16
4.42
4.93
5.49
5.77
5.86
6.28
6.97
7.06
7.78
9.23
9.34
9.91
13.46
18.4
23.89
26.39
.3
2.71
5.43
6.38
6.38
8.32
9.04
9.56
10.01
10.08
10.62
13.80
15.99
17.90
18.25
19.32
19.87
21.60
22.25
4. Use the fueldat data set and proc sgpanel to obtain vertical bar charts showing the means of the Roads
variable for each of the three number of licensed drivers groups defined by the LicGrp variable. Obtain
this plot in three panels corresponding to each level of the IncomGrp variable. Make sure that the three
panels all appear in one row.
Notes:
1. Add a quit; statement at the end of your SAS program to make sure the graph is actually plotted.
2. Include title statements for each of your graphs so that the plot can be identified.
3. To place your color graphs in a file use the statements
ods rtf file="U:\Documents\......\your_folder\xxxx.rtf";
at the top of each your SAS steps and ods rtf close; at the end. You may use pdf destinations instead
of rtf.
4. If
you
enclose
your
code
in
ods html gpath="U:\Documents\......\your_folder\
and ods html close; SAS will save your graph in png format in your folder with a generic name like
HISTOGRAM.png. You can insert such a file into your WORD or Latex document or open it in a viewer
and print it directly.
5. To print files containing SAS color graphs on the color printer in the Gilman lab, select the color printer
Gilman 2272 Color from the print dialog box.
6. About 150 additional printing units have been added to each of your papercut accounts to cover the cost
of color printing. The charge of printing on the above colorprinter is 15 units/page. So make sure that you
print only the final version of each graph.
7. Use proc format to create formats to represent values of class varibles as appropriate to enhance and
improve any of your plots.
Due Tuesday, October 29, 2013
2
Download