Statistics 479 Assignment #3 Answer Key Fall 2013 1. a)State CA Nrooms 3 Income . Nfamily Id . 0020 Type h Age . Sex Educ _N_ _ERROR_ . 1 0 b)State CA Nrooms 3 Income 42000 Nfamily Id 3 0020 Type f Age . Sex Educ _N_ _ERROR_ . 2 0 c)State CA Nrooms 3 Income 42000 Nfamily Id 3 0020 Type p Age 34 Sex M Educ _N_ _ERROR_ 3 3 0 d)State CA Nrooms 3 Income 42000 Nfamily 3 Sex M Educ 3 Age 34 e) If retain statement is omitted the variables State, Nrooms, Income, and Nfamily will be initialized to missing values whenever SAS goes to process a new line of data. The effect of this is that when an observation is written to the data set there will be no values output for those variables because the values that were input to them in previous iteration of the data step would have been initialized to missing values at the time the observation is written to the data set. The SAS System f) Obs 1 2 3 4 5 6 7 8 9 State CA CA CA CA CA NB NB NB NB Nrooms 3 3 3 3 3 3 3 3 3 Income 42000 42000 42000 52000 52000 38000 38000 38000 38000 Nfamily Age Sex Educ 3 3 3 2 2 4 4 4 4 34 31 9 34 31 35 30 11 5 M F F F F M F F M 3 2 1 4 2 4 3 1 1 The SAS System Obs 1 2 3 4 5 6 State Nrooms _TYPE_ CA NB CA NB . 3 . . 3 3 0 1 2 2 3 3 _FREQ_ 9 9 5 4 5 4 18:40 Monday, October 5, 2009 18:40 Monday, October 5, 2009 Age Educ 24.4444 24.4444 27.8000 20.2500 27.8000 20.2500 2.33333 2.33333 2.40000 2.25000 2.40000 2.25000 s_Age s_Educ 12.2893 12.2893 10.6160 14.5000 10.6160 14.5000 1.22474 1.22474 1.14018 1.50000 1.14018 1.50000 1 2 g) Means and Standard deviations are computed for the variables Age and Educ for groups of observations formed by combinations of values of State and Nrooms as described below: Obs=1 Obs=2 Obs=3 Obs=4 Obs=5 Obs=5 : : : : : : for for for for for for all the the the the the 9 9 5 4 5 4 persons persons persons persons persons persons in in in in in in the 3 families in the 2 states (_TYPE_=0) houses with 3 rooms in the 2 states (_TYPE_=1) CA (_TYPE_=2) NB (_TYPE_=2) CA living in houses with 3 rooms (_TYPE_=3) CA living in houses with 3 rooms (_TYPE_=3) 2. a) SAS Program as given in b6.sas with the following libname, infile, and ods statements in the appropriate lines: libname mylib "U:\Documents\Stat479\"; infile “U:\Documents\Stat479\fuel.txt”; ods rtf file= “U:\Documents\Stat479\prob2a_output.rtf”; ods rtf close; See attached output from proc print. b) libname mylib "U:\Documents\Stat479\"; proc means data=mylib.fueldat noprint; class Incomgrp TaxGrp; var Income Fuel Numlic; ways 2; output out=stats_1 mean=Av_Inc Av_Fuel Av_Lic stderr=SD_Inc SD_Fuel SD_Lic; run; ods rtf file= “U:\Documents\Stat479\prob2b_output.rtf”; proc print data=stats_1 label; title 'Statistics from Proc Means'; run; ods rtf close; c) libname mylib “U:\Documents\Stat479\”; ods rtf file= “U:\Documents\Stat479\prob2c_output.rtf”; ods select BasicIntervals TestsForLocation TestsForNormality; proc univariate data=mylib.fueldat cibasic normal mu0=4 50 5; var Income Percent Roads; id State; title 'Use of Proc Univariate to Examine Distributions:1'; run; ods rtf close; Variable Income: Basic Confidence Limits Assuming Normality Parameter Estimate 95% Confidence Limits Mean 4.24183 4.07527 4.40840 Std Deviation 0.57362 0.47752 0.71851 Variance 0.32904 0.22803 0.51626 Tests for Location: Mu0=4 Test Statistic Student's t t Sign M p Value 2.920853 Pr > |t| 0.0053 7 Pr >= |M| 0.0595 248.5 Pr >= |S| Signed Rank S 0.0093 The p-value given (.0053) is for the two-sided test as shown above. Since the p-value is smaller than .05 we reject the null hypothesis at alpha=0.05. Tests for Normality Test Statistic p Value W 0.975229 Pr < W 0.3988 Kolmogorov-Smirnov D 0.080296 Pr > D >0.1500 Cramer-von Mises W-Sq 0.058391 Pr > W-Sq >0.2500 Anderson-Darling A-Sq 0.375093 Pr > A-Sq >0.2500 Shapiro-Wilk The Shapiro-Wilk test results in a very big p-value; thus the null hypotheses that the distribution is normal is not rejected. Variable Income: Basic Confidence Limits Assuming Normality Parameter Estimate 95% Confidence Limits Mean Std Deviation Variance 57.02844 55.41821 58.63867 5.54545 4.61641 6.94612 30.75196 21.31124 48.24853 Tests for Location: Mu0=50 Test Statistic Student's t t Sign M p Value 8.780982 Pr > |t| <.0001 21 Pr >= |M| <.0001 564 Pr >= |S| Signed Rank S <.0001 We need to calculate the p-value for the right-tailed test. The pvalue is <.0001/2 which is smaller than .05 we reject the null hypothesis at alpha=0.05. Tests for Normality Test Statistic p Value W 0.961084 Pr < W 0.1117 Kolmogorov-Smirnov D 0.118653 Pr > D 0.0892 Cramer-von Mises W-Sq 0.122071 Pr > W-Sq 0.0560 Anderson-Darling A-Sq 0.745349 Pr > A-Sq 0.0488 Shapiro-Wilk The Shapiro-Wilk test results in a very big p-value; thus the null hypotheses that the distribution is normal is not rejected. Variable Roads: Basic Confidence Limits Assuming Normality Parameter Estimate 95% Confidence Limits Mean 5.56125 4.54681 6.57569 Std Deviation 3.49360 2.90832 4.37602 12.20527 8.45831 19.14955 Variance Tests for Location: Mu0=5 Test Statistic p Value 1.113021 Pr > |t| Student's t t Sign M Signed Rank S 0.2714 -1 Pr >= |M| 0.8854 60 Pr >= |S| 0.5439 We need to calculate the p-value for the left-tailed test. The pvalue is 0.2714/2=0.1357 which is larger than .05 we fail to reject the null hypothesis at alpha=0.05. Tests for Normality Test Statistic p Value W 0.924902 Pr < W 0.0044 Kolmogorov-Smirnov D 0.11309 Pr > D 0.1258 Cramer-von Mises W-Sq 0.10228 Pr > W-Sq 0.1025 Anderson-Darling A-Sq Shapiro-Wilk 0.744132 Pr > A-Sq 0.0491 The Shapiro-Wilk test results in a very small p-value; thus the null hypotheses that the distribution is normal is rejected.