STATISTICS 479 Assignment #3 (40 points)

advertisement
Fall 2013
STATISTICS 479
Assignment #3 (40 points)
1. Consider the following SAS program, in the file assgn3.prob1 downloadable from the stat479 Homework Assignments page. This program inputs information about family members living in households in
different states, such as number of rooms in a house, number of family members, and years of education
of each individual, using three different data types ’h’, ’f’ and ’p’. Answer the questions below:
data housing;
retain State Nrooms Income Nfamily;
input Id $4. Type $1. @;
if Type =’h’ then input @7 State $2. Nrooms 3.;
if Type =’f’ then input @7 Income 5. Nfamily 3.;
if Type =’p’ then do;
input @7 Age 2. Sex $1. Educ 2. ;
output;
end;
drop Id Type;
datalines;
0020h CA 03
0020f 42000 03
0020p 34M 3
0020p 31F 2
0020p 9F 1
0020f 52000 02
0020p 34F 4
0020p 31F 2
0074h NB 03
0074f 38000 04
0074p 35M 4
0074p 30F 3
0074p 11F 1
0074p 05M 1
;
proc print;
run;
proc means data=housing noprint;
class State Nrooms;
var Age Educ;
output out=stats mean= std=s_Age s_Educ;
run;
proc print data=stats;
run;
(a) Show the contents of the PDV immediately after processing the first line of data.
(b) Show the contents of the PDV immediately after processing the second line of data.
(c) Show the contents of the PDV immediately after processing the third line of data.
(d) Display the first observation written to the SAS data set.
(e) What would happen if the retain statement is omitted. Explain why?
(f) Run this program and turn-in the output only.
(g) Examine the output produced by the second proc print step. Describe the statistics printed in each
line of this output, i.,e., explain what the numbers computed represent. Relate the value of the
_type_ to groups of observations that are used to compute the values of the statistics that appear
in each line of output.
1
2. The following data, extracted from the American Almanac and the World Almanac for 1974, lists the
values of fuel consumption for each of the 48 contiguous states in addition to several other measured
variables. The data set is available as a link in the Homework Assignments page and as a text file
fuel.txt on the desktop Stat479 folder in the lab. Record in the data set consist of data lines with data
values entered in the following format:
Variable
Position Name
Variable Description
1-2
St
State (2 char. state postal code)
3-7
Pop
1971 Population, in thousands
8-9
Tax
1972 Motor Fuel Tax Rate, in cents per gallon (assume one
decimal place)
10-14
Numlic
1971 Number of Licensed Drivers, in thousands
15-18
Income
1972 Per Capita income, in thousands of dollars (assume 3
decimal places)
19-23
Roads
1971 Miles of Federal-Aid Primary Highways, in thousands (assume
3 decimals)
24-28
Fuelc
1972 Fuel Consumption, in millions of gallons
In SAS program B6, this data is accessed (a link to latest version of this program is available on the
Homework Assignments page and as a SAS file b6.sas on the desktop Stat479 folder in the lab). In this
program, the tasks described on pp.79-81 of the textbook are carried out and the resulting SAS dataset
named fueldat is saved in a library.
(a) Modify the SAS program B6 so that the SAS dataset is saved in your U drive Stat479 folder. Insert
ODS statements to direct the output from the proc step to an rtf file in your U drive Stat479 folder.
Run this program, print the output (make sure that it is printed in landscape orientation) and
carefully check the output. Turn in this page. (To keep this to one page of output, note that only
the first 20 states in the dataset are printed)
(b) Write a new SAS program to access the SAS data fueldat from your folder in a proc means step.
This step must contain appropriate var, class, and output statements, required to create a SAS
data set named stats1. Use an option to suppress any printed output from this proc means step.
The data set stats1 must contain sample means and standard errors of the means of the variables
Income, Fuel, and Numlic calculated separately for each of the six groups defined by combinations
of levels of the category variables IncomGrp and TaxGrp. Use the types statement to ensure that
statistics are calculated only for combinations of levels of IncomGrp and TaxGrp. Obtain a listing
of the SAS data set stats1 using a proc print step with labels for all variables appearing in this
listing. Use ODS to create a pdf or rtf file with the results. Turn in the program and the output
pages.
(c) Use the SAS data set fueldat in a new SAS program with a proc univariate step to compute
basic descriptive statistics for the variables Income, Percent, and Roads. Use an appropriate option
to calculate t-tests for the research hypotheses that population means for each of these variables,
respectively, is not equal to $4000, above 50%, and below 5000 miles, respectively (i.e Ha : µ 6=
4, Ha : µ > 50, and Ha : µ < 5, respectively). Also, include an option for calculating 95% confidence
intervals for each of these population parameters. You may use the ODS system to select only those
tables needed, to produce a more compact output for turning-in.
Interpret the results of the t-tests using the p-values printed and α = .05. Do the results of the
Shapiro-Wilk test for normality for each variable above support your conclusions on the shape of the
distributions? You may write your answers to these questions clearly on the output pages.
Due Thursday 3, October 2013
2
Download