advertisement

Fall 2013 STATISTICS 479 Assignment #3 (40 points) 1. Consider the following SAS program, in the file assgn3.prob1 downloadable from the stat479 Homework Assignments page. This program inputs information about family members living in households in different states, such as number of rooms in a house, number of family members, and years of education of each individual, using three different data types ’h’, ’f’ and ’p’. Answer the questions below: data housing; retain State Nrooms Income Nfamily; input Id $4. Type $1. @; if Type =’h’ then input @7 State $2. Nrooms 3.; if Type =’f’ then input @7 Income 5. Nfamily 3.; if Type =’p’ then do; input @7 Age 2. Sex $1. Educ 2. ; output; end; drop Id Type; datalines; 0020h CA 03 0020f 42000 03 0020p 34M 3 0020p 31F 2 0020p 9F 1 0020f 52000 02 0020p 34F 4 0020p 31F 2 0074h NB 03 0074f 38000 04 0074p 35M 4 0074p 30F 3 0074p 11F 1 0074p 05M 1 ; proc print; run; proc means data=housing noprint; class State Nrooms; var Age Educ; output out=stats mean= std=s_Age s_Educ; run; proc print data=stats; run; (a) Show the contents of the PDV immediately after processing the first line of data. (b) Show the contents of the PDV immediately after processing the second line of data. (c) Show the contents of the PDV immediately after processing the third line of data. (d) Display the first observation written to the SAS data set. (e) What would happen if the retain statement is omitted. Explain why? (f) Run this program and turn-in the output only. (g) Examine the output produced by the second proc print step. Describe the statistics printed in each line of this output, i.,e., explain what the numbers computed represent. Relate the value of the _type_ to groups of observations that are used to compute the values of the statistics that appear in each line of output. 1 2. The following data, extracted from the American Almanac and the World Almanac for 1974, lists the values of fuel consumption for each of the 48 contiguous states in addition to several other measured variables. The data set is available as a link in the Homework Assignments page and as a text file fuel.txt on the desktop Stat479 folder in the lab. Record in the data set consist of data lines with data values entered in the following format: Variable Position Name Variable Description 1-2 St State (2 char. state postal code) 3-7 Pop 1971 Population, in thousands 8-9 Tax 1972 Motor Fuel Tax Rate, in cents per gallon (assume one decimal place) 10-14 Numlic 1971 Number of Licensed Drivers, in thousands 15-18 Income 1972 Per Capita income, in thousands of dollars (assume 3 decimal places) 19-23 Roads 1971 Miles of Federal-Aid Primary Highways, in thousands (assume 3 decimals) 24-28 Fuelc 1972 Fuel Consumption, in millions of gallons In SAS program B6, this data is accessed (a link to latest version of this program is available on the Homework Assignments page and as a SAS file b6.sas on the desktop Stat479 folder in the lab). In this program, the tasks described on pp.79-81 of the textbook are carried out and the resulting SAS dataset named fueldat is saved in a library. (a) Modify the SAS program B6 so that the SAS dataset is saved in your U drive Stat479 folder. Insert ODS statements to direct the output from the proc step to an rtf file in your U drive Stat479 folder. Run this program, print the output (make sure that it is printed in landscape orientation) and carefully check the output. Turn in this page. (To keep this to one page of output, note that only the first 20 states in the dataset are printed) (b) Write a new SAS program to access the SAS data fueldat from your folder in a proc means step. This step must contain appropriate var, class, and output statements, required to create a SAS data set named stats1. Use an option to suppress any printed output from this proc means step. The data set stats1 must contain sample means and standard errors of the means of the variables Income, Fuel, and Numlic calculated separately for each of the six groups defined by combinations of levels of the category variables IncomGrp and TaxGrp. Use the types statement to ensure that statistics are calculated only for combinations of levels of IncomGrp and TaxGrp. Obtain a listing of the SAS data set stats1 using a proc print step with labels for all variables appearing in this listing. Use ODS to create a pdf or rtf file with the results. Turn in the program and the output pages. (c) Use the SAS data set fueldat in a new SAS program with a proc univariate step to compute basic descriptive statistics for the variables Income, Percent, and Roads. Use an appropriate option to calculate t-tests for the research hypotheses that population means for each of these variables, respectively, is not equal to $4000, above 50%, and below 5000 miles, respectively (i.e Ha : µ 6= 4, Ha : µ > 50, and Ha : µ < 5, respectively). Also, include an option for calculating 95% confidence intervals for each of these population parameters. You may use the ODS system to select only those tables needed, to produce a more compact output for turning-in. Interpret the results of the t-tests using the p-values printed and α = .05. Do the results of the Shapiro-Wilk test for normality for each variable above support your conclusions on the shape of the distributions? You may write your answers to these questions clearly on the output pages. Due Thursday 3, October 2013 2