UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas Homework 7 – Solutions Write a SAS Program to Import the Data See the example SAS program at the end of the solutions. One-Sided t-Tests in SAS In SAS, use PROC TTEST and the data from the sample of 25 NC counties (that is, the data in dataset01) to test the null hypothesis that the mean value of POP2006 (population per county in 2006) for the all NC counties is equal to 80000 versus the alternative hypothesis that the mean is greater than 80000. Use a confidence level of 95% (significance level of 5%). H0: µ = 80000 H1: µ > 80000 ==> One-sided test Based on the confidence level, the alpha-value for this test is 0.05. Based on the alpha-value, from the t-table we find the t-critical number for this one-sided test is 1.645. Based on the SAS output for this t-test (see below): Xbar = 99684.8, t-test value = 0.63, p-value = 0.2660 The TTEST Procedure Variable: POP2006 N 25 Mean 99684.8 Mean 99684.8 Std Dev 155247 95% CL Mean 46563.0 Infty DF 24 Std Err 31049.4 Minimum 10912.0 Std Dev 155247 t Value 0.63 Maximum 786522 95% CL Std Dev 121221 215972 Pr > t 0.2660 Because t-test is not farther from zero than t-critical, or, equivalently the p-value is not less than the αvalue, we do NOT reject H0. Therefore, we conclude that the mean value of POP2006 is not statistically more than 80000 at the 5% level of significance. (Even though Xbar is larger than 80000, there is so much variation in the X values in the sample that we cannot be sure that µ is larger than 80000.) 1 UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas In SAS, use PROC TTEST and the data from the sample of 25 NC counties (that is, the data in dataset01) to test the null hypothesis that the mean value of POP2006 (population per county in 2006) for the all NC counties is equal to 100000 versus the alternative hypothesis that the mean is less than 100000. Use a confidence level of 95% (significance level of 5%). H0: µ = 100000 H1: µ < 100000 ==> One-sided test Based on the confidence level, the alpha-value for this test is 0.05. Based on the alpha-value, from the t-table we find the t-critical number for this one-sided test is -1.645. Based on the SAS output for this t-test (see below): Xbar = 99684.8, t-test value = -0.01, p-value = 0.4960 The TTEST Procedure Variable: POP2006 N 25 Mean 99684.8 Mean 99684.8 Std Dev 155247 Std Err 31049.4 95% CL Mean -Infty 152807 DF 24 Minimum 10912.0 Std Dev 155247 t Value -0.01 Maximum 786522 95% CL Std Dev 121221 215972 Pr < t 0.4960 Because t-test is not farther from zero than t-critical, or, equivalently the p-value is not less than the αvalue, we do NOT reject H0. Therefore, conclude that the mean value of POP2006 is not statistically less than 100000 at the 5% significance level. 2 UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas Two-Sided t-Test in SAS In SAS, use PROC TTEST and the data from the sample of 25 NC counties (that is, the data in dataset01) to test the null hypothesis that the mean value of POP2006 (population per county in 2006) for the all NC counties is equal to 85000 versus the alternative hypothesis that the mean is different from 85000. Use a confidence level of 95% (significance level of 5%). H0: µ = 85000 H1: µ ≠ 85000 ==> TWO-sided test Based on the confidence level, the alpha-value for this test is 0.05. Based on the alpha-value, from the t-table we find the t-critical number for this one-sided test is +/-1.96. Based on the SAS output for this t-test (see below): Xbar = 99684.8, t-test value = 0.47, p-value = 0.6405 The TTEST Procedure Variable: POP2006 N 25 Mean 99684.8 Mean 99684.8 Std Dev 155247 Std Err 31049.4 95% CL Mean 35602.0 163768 DF 24 t Value 0.47 Minimum 10912.0 Std Dev 155247 Maximum 786522 95% CL Std Dev 121221 215972 Pr > |t| 0.6405 Because t-test is not farther from zero than t-critical, or, equivalently the p-value is not less than the αvalue, we do NOT reject H0. Therefore, conclude that the mean value of POP2006 is not statistically different from 85000 at the 5% level of significance. 3 UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas Independent Samples t-Test in SAS We want to test whether the mean value of POP2006 for North Carolina counties that voted Republican in the 2004 presidential election (indicated by having a value of R for the VOTE2004 variable) is different from the mean value of POP2006 for counties that voted Democratic (indicated by having a value of D for the VOTE2004 variable). First, use PROC SORT to sort the data by the VOTE2004 variable (this separates the data in dataset01 into two groups, those counties that voted R, and those that voted D). After the PROC SORT command, use a separate PROC TTEST command to do the test. The null hypothesis is that there is no difference in the mean value of POP2006 (that is, the difference is zero) between the two types of counties. H0: (d = μD – μR) = 0 H1: (d = μD – μR) ≠ 0 ===> this is a TWO-SIDED test In the SAS output below, the difference in sample means, Dbar, is -19417.4 (Note: if you used d = μR – μD in your definition of H0, then Dbar will be positive 19417.4; that’s fine). The question is whether +/-19417.4 is significantly different from zero (at a 95% confidence level). Before we answer this question, notice that SAS gives you the output for two versions of this test, one in which the variances of the two populations are equal (the results on the “Pooled” rows), and one in which the variances are allowed to be different (the results on the “Satterthwaite” rows). The results in the “Folded F” row tell you whether you should use the “variances equal / Pooled” results, or the “variances unequal / Satterthwaite” results. Because the p-value for the Folded F test (p-value = 0.1309)is not less than alpha/2 = 0.025, we do not reject the null hypothesis of equal variances. So, we use the “variances equal / Pooled” results. So, returning to the test involving Dbar, we look at the SAS output on the “variances equal / Pooled” row to find the ttest value of -0.26 and the p-value of 0.7958. From the two-sided t-table, for d.f. = 23 (d.f. is n1 + n2 - 2 for this test) and α/2 = 0.025, the tcritical value is 2.069. Because ttest is not farther from zero than tcritical, (or, because the p-value is not less than α/2 ), we do not reject H0. Thus, we conclude that there is no statistical difference in POP2006 between the counties that voted D and the counties that voted R, at the 5% level of significance. The TTEST Procedure Variable: POP2006 VOTE2004 D R Diff (1-2) VOTE2004 D R Diff (1-2) Diff (1-2) N 6 19 Mean 84927.5 104345 -19417.4 Std Dev 86480.7 173097 158350 Method Pooled Satterthwaite VOTE2004 D R Diff (1-2) Diff (1-2) Method Pooled Satterthwaite Method Folded F Mean 84927.5 104345 -19417.4 -19417.4 Method Pooled Satterthwaite Variances Equal Unequal DF 23 17.759 Std Err 35305.6 39711.1 74154.2 Minimum 23581.0 10912.0 95% CL Mean -5828.5 175683 20915.0 187775 -172817 133982 -131161 92326.4 Maximum 246896 786522 Std Dev 86480.7 173097 158350 95% CL Std Dev 53982.0 212104 130794 255980 123072 222127 t Value -0.26 -0.37 Equality of Variances Num DF Den DF F Value 18 5 4.01 Pr > |t| 0.7958 0.7191 Pr > F 0.1309 4 UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas Dependent (Paired) Samples t-test. In the handout “t-Tests in SAS,” we conducted a dependent (paired) samples t-test to determine whether the mean value of PCINC2005 was different from the mean value of PCINC2000. We rejected the null hypothesis of “no difference,” that is, we rejected H0: (d = μ1 - μ0 ) = 0. So, we concluded that the mean of PCINC2005 was different from the mean of PCINC2000. Now, in this homework, let’s use a dependent (paired) samples t-test again, but this time let’s use it to investigate how big the difference might be. Specifically, conduct a dependent (paired) samples t-test to determine whether the mean value of PCINC2005 was 3000 larger than the mean value of PCINC2000. This is asking whether per capita income, on average across all NC counties, grew by $3000 between year 2000 and year 2005. To do this, run the same a dependent (paired) samples t-test in SAS, but use “h0 = 3000” rather than “h0 = 0.” This tests whether the difference in the means is $3000, or whether it’s something other than $3000 (either higher or lower). That is, it tests the hypotheses: H0: (d = μ1 - μ0 ) = $3000 H1: d ≠ $3000 (two-sided test) The TTEST Procedure Difference: PCINC2005 - PCINC2000 N Mean Std Dev Std Err Minimum Maximum 25 3546.2 1553.7 310.7 1370.0 8468.0 Mean 3546.2 95% CL Mean 2904.9 Std Dev 4187.5 1553.7 DF t Value Pr > |t| 24 1.76 0.0915 95% CL Std Dev 1213.2 2161.4 In the SAS output above, we see that in our sample, the mean difference, Dbar, is 3546.2. Is this difference statistically different from 3000? From the two-sided t-table, for d.f. = 24 and α/2 = 0.025, the tcritical value for this test is 2.064. Looking at the SAS output, the ttest value is 1.76. Because ttest is not farther from zero than tcritical, we do not reject H0. Therefore, we conclude that the mean difference between PCINC2005 and PCINC2000 is not statistically different from 3000 at the 5% significance level. 5 UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas SAS Program /* SOFTWARE: SAS Statistical Software program, version 9.2 AUTHOR: Dr. Chris Dumas, UNC-Wilmington, Sept. 10, 2015. TITLE: HW07 to conduct t-tests tests using PROC TTEST. */ options options options options helpbrowser=sas; number pageno=1 nodate nolabel font="SAS Monospace" 10; leftmargin=1.00 in rightmargin=1.00 in ; topmargin=1.00 in bottommargin=1.00 in; proc import datafile="v:\ecn377\ProcTTESTdata.xls" dbms=xls out=dataset01 replace; run; proc ttest data=dataset01 h0=80000 sides=U alpha=0.05; var POP2006; run; proc ttest data=dataset01 h0=100000 sides=L alpha=0.05; var POP2006; run; proc ttest data=dataset01 h0=85000 sides=2 alpha=0.05; var POP2006; run; proc sort data=dataset01; by VOTE2004; run; proc ttest data=dataset01 h0=0 sides=2 alpha=0.05; class VOTE2004; var POP2006; run; proc ttest data=dataset01 h0=3000 sides=2 alpha=0.05; paired pcinc2005*pcinc2000; run; 6