UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas Homework 7 – t-Tests in SAS (due Tuesday, Sept. 15) In Homework 7 you will practice conducting t-tests in SAS using an example dataset “ProcTTESTdata.xls”. (This dataset is in the “handouts” folder on the class website.) This dataset file contains data on 7 variables for a random sample of 25 counties drawn from the population of all 100 North Carolina counties. The 7 variables are: Variable Name CNTYNAME REGION SQMILES POP2006 PCINC2005 PCINC2000 VOTE2004 Variable Type Text Variable Text Variable Numerical Measurement Variable Numerical Measurement Variable Numerical Measurement Variable Numerical Measurement Variable Text Variable Variable Definition Name of county Geographic region: M, P or C Area of county in square miles County population in 2006 County per capita income in 2005 County per capita income in 2000 2004 Presidential election winning party: R or D Note: Use the SAS command examples in handout “t-Tests in SAS” as a guide. Write a SAS Program to Import the Data Write a SAS program that uses PROC IMPORT to import the ProcTTESTdata.xls data file into SAS. Name the dataset “dataset01” in SAS. One-Sided t-Tests in SAS In SAS, use PROC TTEST and the data from the sample of 25 NC counties (that is, the data in dataset01) to test the null hypothesis that the mean value of POP2006 (population per county in 2006) for the all NC counties is equal to 80000 versus the alternative hypothesis that the mean is greater than 80000. Use a confidence level of 95% (significance level of 5%). In SAS, use PROC TTEST and the data from the sample of 25 NC counties (that is, the data in dataset01) to test the null hypothesis that the mean value of POP2006 (population per county in 2006) for the all NC counties is equal to 100000 versus the alternative hypothesis that the mean is less than 100000. Use a confidence level of 95% (significance level of 5%). Two-Sided t-Test in SAS In SAS, use PROC TTEST and the data from the sample of 25 NC counties (that is, the data in dataset01) to test the null hypothesis that the mean value of POP2006 (population per county in 2006) for the all NC counties is equal to 85000 versus the alternative hypothesis that the mean is different from 85000. Use a confidence level of 95% (significance level of 5%). 1 UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas Independent Samples t-Test in SAS Before we conduct an independent samples t-Test in SAS, we typically need to sort the data. Suppose we want to test whether the mean value of POP2006 for North Carolina counties that voted Republican in the 2004 presidential election (indicated by having a value of R for the VOTE2004 variable) is different from the mean value of POP2006 for counties that voted Democratic (indicated by having a value of D for the VOTE2004 variable). To set up this test, use PROC SORT to sort the data by the VOTE2004 variable (this separates the data in dataset01 into two groups, those counties that voted R, and those that voted D). After the PROC SORT command, use a separate PROC TTEST command to do the test. The null hypothesis is that there is no difference in the mean value of POP2006 (that is, the difference is zero) between the two types of counties. The alternative hypothesis is that there is a difference (either positive or negative). Use a confidence level of 95% (significance level of 5%). In the PROC TTEST command, use VOTE2004 for the “class” variable, and use POP2006 for the “var” variable. The “class” variable is always the variable that is used to separate the data into categories (in this case, R vs. D), and the “var” variable is the one for which you are testing for a difference in means. (This results of this test can be used to investigate whether county population size is associated with political party support.) Dependent (Paired) Samples t-test. In the handout “t-Tests in SAS,” we conducted a dependent (paired) samples t-test to determine whether the mean value of PCINC2005 was different from the mean value of PCINC2000. We rejected the null hypothesis of “no difference,” that is, we rejected H0: (d = μ1 - μ0 ) = 0. So, we concluded that the mean of PCINC2005 was different from the mean of PCINC2000. Now, in this homework, let’s use a dependent (paired) samples t-test again, but this time let’s use it to investigate how big the difference might be. Specifically, conduct a dependent (paired) samples t-test to determine whether the mean value of PCINC2005 was 3000 larger than the mean value of PCINC2000. This is asking whether per capita income, on average across all NC counties, grew by $3000 between year 2000 and year 2005. Use a confidence level of 95% (significance level of 5%). To do this, run the same a dependent (paired) samples t-test in SAS, but use “h0 = 3000” rather than “h0 = 0.” This tests whether the difference in the means in $3000, or whether it’s something other than $3000 (either higher or lower). That is, it tests the hypothesis: H0: (d = μ1 - μ0 ) = $3000 H1: d ≠ $3000 (two-sided test) Save Your Program and Write up Your Homework After you run your SAS program and verify that it is working correctly, save the SAS program as HW07.sas. Print out your program (you can copy it from the Editor window of SAS, paste it into Word, and print it), and turn it in with your homework. Also, when this homework asks you to answer specific questions about the results, you need to answer in complete sentences, in addition to giving the appropriate numbers from the t-tests. Be sure to put your name, ECN377, your section, and “Homework 7” at the top of your homework. 2