Homework 07

advertisement
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
Homework 7 – t-Tests in SAS
(due Tuesday, Sept. 15)
In Homework 7 you will practice conducting t-tests in SAS using an example dataset “ProcTTESTdata.xls”.
(This dataset is in the “handouts” folder on the class website.) This dataset file contains data on 7 variables for a
random sample of 25 counties drawn from the population of all 100 North Carolina counties. The 7 variables are:
Variable Name
CNTYNAME
REGION
SQMILES
POP2006
PCINC2005
PCINC2000
VOTE2004
Variable Type
Text Variable
Text Variable
Numerical Measurement Variable
Numerical Measurement Variable
Numerical Measurement Variable
Numerical Measurement Variable
Text Variable
Variable Definition
Name of county
Geographic region: M, P or C
Area of county in square miles
County population in 2006
County per capita income in 2005
County per capita income in 2000
2004 Presidential election winning party: R or D
Note: Use the SAS command examples in handout “t-Tests in SAS” as a guide.
Write a SAS Program to Import the Data
Write a SAS program that uses PROC IMPORT to import the ProcTTESTdata.xls data file into SAS. Name the
dataset “dataset01” in SAS.
One-Sided t-Tests in SAS
In SAS, use PROC TTEST and the data from the sample of 25 NC counties (that is, the data in dataset01) to test
the null hypothesis that the mean value of POP2006 (population per county in 2006) for the all NC counties is
equal to 80000 versus the alternative hypothesis that the mean is greater than 80000. Use a confidence level of
95% (significance level of 5%).
In SAS, use PROC TTEST and the data from the sample of 25 NC counties (that is, the data in dataset01) to test
the null hypothesis that the mean value of POP2006 (population per county in 2006) for the all NC counties is
equal to 100000 versus the alternative hypothesis that the mean is less than 100000. Use a confidence level of
95% (significance level of 5%).
Two-Sided t-Test in SAS
In SAS, use PROC TTEST and the data from the sample of 25 NC counties (that is, the data in dataset01) to test
the null hypothesis that the mean value of POP2006 (population per county in 2006) for the all NC counties is
equal to 85000 versus the alternative hypothesis that the mean is different from 85000. Use a confidence level of
95% (significance level of 5%).
1
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
Independent Samples t-Test in SAS
Before we conduct an independent samples t-Test in SAS, we typically need to sort the data. Suppose we want to
test whether the mean value of POP2006 for North Carolina counties that voted Republican in the 2004
presidential election (indicated by having a value of R for the VOTE2004 variable) is different from the mean
value of POP2006 for counties that voted Democratic (indicated by having a value of D for the VOTE2004
variable). To set up this test, use PROC SORT to sort the data by the VOTE2004 variable (this separates the data
in dataset01 into two groups, those counties that voted R, and those that voted D).
After the PROC SORT command, use a separate PROC TTEST command to do the test. The null hypothesis is
that there is no difference in the mean value of POP2006 (that is, the difference is zero) between the two types of
counties. The alternative hypothesis is that there is a difference (either positive or negative). Use a confidence
level of 95% (significance level of 5%). In the PROC TTEST command, use VOTE2004 for the “class” variable,
and use POP2006 for the “var” variable. The “class” variable is always the variable that is used to separate the
data into categories (in this case, R vs. D), and the “var” variable is the one for which you are testing for a
difference in means.
(This results of this test can be used to investigate whether county population size is associated with political
party support.)
Dependent (Paired) Samples t-test.
In the handout “t-Tests in SAS,” we conducted a dependent (paired) samples t-test to determine whether the mean
value of PCINC2005 was different from the mean value of PCINC2000. We rejected the null hypothesis of “no
difference,” that is, we rejected H0: (d = μ1 - μ0 ) = 0. So, we concluded that the mean of PCINC2005 was
different from the mean of PCINC2000.
Now, in this homework, let’s use a dependent (paired) samples t-test again, but this time let’s use it to investigate
how big the difference might be. Specifically, conduct a dependent (paired) samples t-test to determine whether
the mean value of PCINC2005 was 3000 larger than the mean value of PCINC2000. This is asking whether per
capita income, on average across all NC counties, grew by $3000 between year 2000 and year 2005. Use a
confidence level of 95% (significance level of 5%).
To do this, run the same a dependent (paired) samples t-test in SAS, but use “h0 = 3000” rather than “h0 = 0.”
This tests whether the difference in the means in $3000, or whether it’s something other than $3000 (either higher
or lower). That is, it tests the hypothesis:
H0: (d = μ1 - μ0 ) = $3000
H1: d ≠ $3000 (two-sided test)
Save Your Program and Write up Your Homework
After you run your SAS program and verify that it is working correctly, save the SAS program as HW07.sas.
Print out your program (you can copy it from the Editor window of SAS, paste it into Word, and print it), and turn
it in with your homework. Also, when this homework asks you to answer specific questions about the results, you
need to answer in complete sentences, in addition to giving the appropriate numbers from the t-tests. Be sure to
put your name, ECN377, your section, and “Homework 7” at the top of your homework.
2
Download