Homework 07 Solutions

advertisement
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
Homework 7 – Solutions
Write a SAS Program to Import the Data
See the example SAS program at the end of the solutions.
One-Sided t-Tests in SAS
In SAS, use PROC TTEST and the data from the sample of 25 NC counties (that is, the data in dataset01) to test
the null hypothesis that the mean value of POP2006 (population per county in 2006) for the all NC counties is
equal to 80000 versus the alternative hypothesis that the mean is greater than 80000. Use a confidence level of
95% (significance level of 5%).
H0: µ = 80000
H1: µ > 80000 ==> One-sided test
Based on the confidence level, the alpha-value for this test is 0.05.
Based on the alpha-value, from the t-table we find the t-critical number for this one-sided test is 1.645.
Based on the SAS output for this t-test (see below):
Xbar = 99684.8, t-test value = 0.63, p-value = 0.2660
The TTEST Procedure
Variable: POP2006
N
25
Mean
99684.8
Mean
99684.8
Std Dev
155247
95% CL Mean
46563.0 Infty
DF
24
Std Err
31049.4
Minimum
10912.0
Std Dev
155247
t Value
0.63
Maximum
786522
95% CL Std Dev
121221
215972
Pr > t
0.2660
Because t-test is not farther from zero than t-critical, or, equivalently the p-value is not less than the αvalue, we do NOT reject H0. Therefore, we conclude that the mean value of POP2006 is not statistically
more than 80000 at the 5% level of significance. (Even though Xbar is larger than 80000, there is so
much variation in the X values in the sample that we cannot be sure that µ is larger than 80000.)
1
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
In SAS, use PROC TTEST and the data from the sample of 25 NC counties (that is, the data in dataset01) to test
the null hypothesis that the mean value of POP2006 (population per county in 2006) for the all NC counties is
equal to 100000 versus the alternative hypothesis that the mean is less than 100000. Use a confidence level of
95% (significance level of 5%).
H0: µ = 100000
H1: µ < 100000 ==> One-sided test
Based on the confidence level, the alpha-value for this test is 0.05.
Based on the alpha-value, from the t-table we find the t-critical number for this one-sided test is -1.645.
Based on the SAS output for this t-test (see below):
Xbar = 99684.8, t-test value = -0.01, p-value = 0.4960
The TTEST Procedure
Variable: POP2006
N
25
Mean
99684.8
Mean
99684.8
Std Dev
155247
Std Err
31049.4
95% CL Mean
-Infty
152807
DF
24
Minimum
10912.0
Std Dev
155247
t Value
-0.01
Maximum
786522
95% CL Std Dev
121221
215972
Pr < t
0.4960
Because t-test is not farther from zero than t-critical, or, equivalently the p-value is not less than the αvalue, we do NOT reject H0.
Therefore, conclude that the mean value of POP2006 is not statistically less than 100000 at the 5%
significance level.
2
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
Two-Sided t-Test in SAS
In SAS, use PROC TTEST and the data from the sample of 25 NC counties (that is, the data in dataset01) to test
the null hypothesis that the mean value of POP2006 (population per county in 2006) for the all NC counties is
equal to 85000 versus the alternative hypothesis that the mean is different from 85000. Use a confidence level of
95% (significance level of 5%).
H0: µ = 85000
H1: µ ≠ 85000 ==> TWO-sided test
Based on the confidence level, the alpha-value for this test is 0.05.
Based on the alpha-value, from the t-table we find the t-critical number for this one-sided test is +/-1.96.
Based on the SAS output for this t-test (see below):
Xbar = 99684.8, t-test value = 0.47, p-value = 0.6405
The TTEST Procedure
Variable: POP2006
N
25
Mean
99684.8
Mean
99684.8
Std Dev
155247
Std Err
31049.4
95% CL Mean
35602.0
163768
DF
24
t Value
0.47
Minimum
10912.0
Std Dev
155247
Maximum
786522
95% CL Std Dev
121221
215972
Pr > |t|
0.6405
Because t-test is not farther from zero than t-critical, or, equivalently the p-value is not less than the αvalue, we do NOT reject H0.
Therefore, conclude that the mean value of POP2006 is not statistically different from 85000 at the 5%
level of significance.
3
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
Independent Samples t-Test in SAS
We want to test whether the mean value of POP2006 for North Carolina counties that voted Republican in the
2004 presidential election (indicated by having a value of R for the VOTE2004 variable) is different from the
mean value of POP2006 for counties that voted Democratic (indicated by having a value of D for the VOTE2004
variable). First, use PROC SORT to sort the data by the VOTE2004 variable (this separates the data in dataset01
into two groups, those counties that voted R, and those that voted D). After the PROC SORT command, use a
separate PROC TTEST command to do the test. The null hypothesis is that there is no difference in the mean
value of POP2006 (that is, the difference is zero) between the two types of counties.
H0: (d = μD – μR) = 0
H1: (d = μD – μR) ≠ 0 ===> this is a TWO-SIDED test
In the SAS output below, the difference in sample means, Dbar, is -19417.4 (Note: if you used d = μR – μD in your
definition of H0, then Dbar will be positive 19417.4; that’s fine). The question is whether +/-19417.4 is
significantly different from zero (at a 95% confidence level).
Before we answer this question, notice that SAS gives you the output for two versions of this test, one in
which the variances of the two populations are equal (the results on the “Pooled” rows), and one in which
the variances are allowed to be different (the results on the “Satterthwaite” rows). The results in the
“Folded F” row tell you whether you should use the “variances equal / Pooled” results, or the “variances
unequal / Satterthwaite” results. Because the p-value for the Folded F test (p-value = 0.1309)is not less
than alpha/2 = 0.025, we do not reject the null hypothesis of equal variances. So, we use the “variances
equal / Pooled” results.
So, returning to the test involving Dbar, we look at the SAS output on the “variances equal / Pooled” row to find
the ttest value of -0.26 and the p-value of 0.7958. From the two-sided t-table, for d.f. = 23 (d.f. is n1 + n2 - 2 for
this test) and α/2 = 0.025, the tcritical value is 2.069. Because ttest is not farther from zero than tcritical, (or, because the
p-value is not less than α/2 ), we do not reject H0. Thus, we conclude that there is no statistical difference in
POP2006 between the counties that voted D and the counties that voted R, at the 5% level of significance.
The TTEST Procedure
Variable: POP2006
VOTE2004
D
R
Diff (1-2)
VOTE2004
D
R
Diff (1-2)
Diff (1-2)
N
6
19
Mean
84927.5
104345
-19417.4
Std Dev
86480.7
173097
158350
Method
Pooled
Satterthwaite
VOTE2004
D
R
Diff (1-2)
Diff (1-2)
Method
Pooled
Satterthwaite
Method
Folded F
Mean
84927.5
104345
-19417.4
-19417.4
Method
Pooled
Satterthwaite
Variances
Equal
Unequal
DF
23
17.759
Std Err
35305.6
39711.1
74154.2
Minimum
23581.0
10912.0
95% CL Mean
-5828.5
175683
20915.0
187775
-172817
133982
-131161 92326.4
Maximum
246896
786522
Std Dev
86480.7
173097
158350
95% CL Std Dev
53982.0
212104
130794
255980
123072
222127
t Value
-0.26
-0.37
Equality of Variances
Num DF
Den DF
F Value
18
5
4.01
Pr > |t|
0.7958
0.7191
Pr > F
0.1309
4
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
Dependent (Paired) Samples t-test.
In the handout “t-Tests in SAS,” we conducted a dependent (paired) samples t-test to determine whether the mean
value of PCINC2005 was different from the mean value of PCINC2000. We rejected the null hypothesis of “no
difference,” that is, we rejected H0: (d = μ1 - μ0 ) = 0. So, we concluded that the mean of PCINC2005 was
different from the mean of PCINC2000.
Now, in this homework, let’s use a dependent (paired) samples t-test again, but this time let’s use it to investigate
how big the difference might be. Specifically, conduct a dependent (paired) samples t-test to determine whether
the mean value of PCINC2005 was 3000 larger than the mean value of PCINC2000. This is asking whether per
capita income, on average across all NC counties, grew by $3000 between year 2000 and year 2005.
To do this, run the same a dependent (paired) samples t-test in SAS, but use “h0 = 3000” rather than “h0 = 0.”
This tests whether the difference in the means is $3000, or whether it’s something other than $3000 (either higher
or lower). That is, it tests the hypotheses:
H0: (d = μ1 - μ0 ) = $3000
H1: d ≠ $3000 (two-sided test)
The TTEST Procedure
Difference:
PCINC2005 - PCINC2000
N
Mean
Std Dev
Std Err
Minimum
Maximum
25
3546.2
1553.7
310.7
1370.0
8468.0
Mean
3546.2
95% CL Mean
2904.9
Std Dev
4187.5
1553.7
DF
t Value
Pr > |t|
24
1.76
0.0915
95% CL Std Dev
1213.2
2161.4
In the SAS output above, we see that in our sample, the mean difference, Dbar, is 3546.2. Is this difference
statistically different from 3000? From the two-sided t-table, for d.f. = 24 and α/2 = 0.025, the tcritical value for this
test is 2.064. Looking at the SAS output, the ttest value is 1.76. Because ttest is not farther from zero than tcritical, we
do not reject H0. Therefore, we conclude that the mean difference between PCINC2005 and PCINC2000 is not
statistically different from 3000 at the 5% significance level.
5
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
SAS Program
/*
SOFTWARE: SAS Statistical Software program, version 9.2
AUTHOR: Dr. Chris Dumas, UNC-Wilmington, Sept. 10, 2015.
TITLE: HW07 to conduct t-tests tests using PROC TTEST.
*/
options
options
options
options
helpbrowser=sas;
number pageno=1 nodate nolabel font="SAS Monospace" 10;
leftmargin=1.00 in rightmargin=1.00 in ;
topmargin=1.00 in bottommargin=1.00 in;
proc import datafile="v:\ecn377\ProcTTESTdata.xls" dbms=xls out=dataset01 replace;
run;
proc ttest data=dataset01 h0=80000 sides=U alpha=0.05;
var POP2006;
run;
proc ttest data=dataset01 h0=100000 sides=L alpha=0.05;
var POP2006;
run;
proc ttest data=dataset01 h0=85000 sides=2 alpha=0.05;
var POP2006;
run;
proc sort data=dataset01;
by VOTE2004;
run;
proc ttest data=dataset01 h0=0 sides=2 alpha=0.05;
class VOTE2004;
var POP2006;
run;
proc ttest data=dataset01 h0=3000 sides=2 alpha=0.05;
paired pcinc2005*pcinc2000;
run;
6
Download