Fall 2013 STATISTICS 479 Assignment #5 (40 points) 1. Write five proc steps using statistical graphics (sg) procedures in the same SAS program, to access the fuel SAS dataset and produce the following graphs: (a) Scatterplot of Fuel variable against Roads variable labeling each point by the two-letter state postal code. The points must be filled circles of dark cyan color of size 5 pixels. (b) Histogram of Income varaible with bins staring at 3 with bin width of 0.5. The variable on the vertical axis must be the frequencies (counts). Change the fill color to a suitable light color such as cornflowerblue or lightsalmon). Overlay the histogram with a normal density fit with estimated parameters. (c) Simple linear regression fit plot of Fuel variable against Income variable showing confidence and prediction bands. (d) Scatterplot matrix of the four variables Fuel, Roads, Pop, and Numlic, and identify the each point i.e., state) by its fuel tax group (use the TaxGrp varaible). (e) Obtain a dot plot of the Roads variable to display miles of primary highway per state. (use proc sgplot here.) 2. An experiment was performed to determine whether two forms of iron (Fe2+ and Fe3+ ) are retained differently. (If one form of iron were retained especially well, it would be the better dietary supplement.) The investigators divided 108 mice randomly into 6 groups of 18 each; 3 groups were given Fe2+ in three different concentrations, 10.2, 1.2, and .3 millimolar, and 3 groups were given Fe3+ at the same three concentrations. The mice were given the iron orally; the iron was radioactively labeled so that a counter could be used to measure the initial amount given. At a later time, another count was taken for each mouse, and the percentage of iron retained was calculated. The data are on the next page. Construct side-by-side box plots using proc sgplot as in Example C16 for the six treatments. Place the plots on the x-axis in the order High, Medium, and Low doses for each of the two forms of iron, respectively. The data, given below are also in the file iron.txt (a) Compare and comment on features such as shape, location, dispersion and outliers of the 6 iron retention distributions. (b) Comment on any observed trend in the median % iron retention over the levels and the forms of iron. What is the observed trend in dispersion (as measured by, say, IQR)? That is, compare the distributions of % iron retention across the six treatments. (c) Statistical analysis of this data (for e.g., using a one way anova model to compare means) may be complicated by failure of assumptions such as homogeneity of variance and/or non-normal distributions. Do the box plots show evidence of these problems? Explain. If there is reason to believe that the assumptions fail based on the plots, a possible explanation is that each distribution is related to the median level of % iron retention in some way. Discuss whether there appears to be such a relationship. 3. Use the fueldat data set and proc sgplot to obtain horizontal bar charts showing the means of the Fuel variable for each of the three groups of states with number of licensed drivers groups (below 54%, between 54 to 58%, and above 58%) defined by the LicGrp variable. Subdivide each bar into the three Income groups , defined by the IncomGrp variable. Use the keylegend statement to position the legend in the top right corner inside the outline. To make room for the legend include the statement yaxis offsetmin=.2; in your proc step. (Note: In order to create the LicGrp variable use the piece of SAS code named assign5.prob3 to begin this program.) 1 Fe3+ 10.2 .71 1.66 2.01 2.16 2.42 2.42 2.56 2.60 3.31 3.64 3.74 3.74 4.39 4.50 5.07 5.26 8.15 8.24 1.2 2.20 2.93 3.08 3.49 4.11 4.95 5.16 5.54 5.68 6.25 7.25 7.90 8.85 11.96 15.54 15.89 18.3 18.59 .3 2.25 3.93 5.08 5.82 5.84 6.89 8.50 8.56 9.44 10.52 13.46 13.57 14.76 16.41 16.96 17.56 22.82 29.13 Fe2+ 10.2 2.20 2.69 3.54 3.75 3.83 4.08 4.27 4.53 5.32 6.18 6.22 6.33 6.97 6.97 7.52 8.36 11.65 12.45 1.2 4.04 4.16 4.42 4.93 5.49 5.77 5.86 6.28 6.97 7.06 7.78 9.23 9.34 9.91 13.46 18.4 23.89 26.39 .3 2.71 5.43 6.38 6.38 8.32 9.04 9.56 10.01 10.08 10.62 13.80 15.99 17.90 18.25 19.32 19.87 21.60 22.25 4. Use the fueldat data set and proc sgpanel to obtain vertical bar charts showing the means of the Roads variable for each of the three number of licensed drivers groups defined by the LicGrp variable. Obtain this plot in three panels corresponding to each level of the IncomGrp variable. Make sure that the three panels all appear in one row. Notes: 1. Add a quit; statement at the end of your SAS program to make sure the graph is actually plotted. 2. Include title statements for each of your graphs so that the plot can be identified. 3. To place your color graphs in a file use the statements ods rtf file="U:\Documents\......\your_folder\xxxx.rtf"; at the top of each your SAS steps and ods rtf close; at the end. You may use pdf destinations instead of rtf. 4. If you enclose your code in ods html gpath="U:\Documents\......\your_folder\ and ods html close; SAS will save your graph in png format in your folder with a generic name like HISTOGRAM.png. You can insert such a file into your WORD or Latex document or open it in a viewer and print it directly. 5. To print files containing SAS color graphs on the color printer in the Gilman lab, select the color printer Gilman 2272 Color from the print dialog box. 6. About 150 additional printing units have been added to each of your papercut accounts to cover the cost of color printing. The charge of printing on the above colorprinter is 15 units/page. So make sure that you print only the final version of each graph. 7. Use proc format to create formats to represent values of class varibles as appropriate to enhance and improve any of your plots. Due Tuesday, October 29, 2013 2