UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas t-Tests in SAS In this handout we conduct t-tests in SAS using an example dataset “ProcTTESTdata.xls”. This dataset file contains data on 7 variables for a random sample of 25 counties drawn from the population of all 100 North Carolina counties. The 7 variables are: Variable Name CNTYNAME REGION SQMILES POP2006 PCINC2005 PCINC2000 VOTE2004 Variable Type Text Variable Text Variable Numerical Measurement Variable Numerical Measurement Variable Numerical Measurement Variable Numerical Measurement Variable Text Variable Variable Definition Name of county Geographic region: M, P or C Area of county in square miles County population in 2006 County per capita income in 2005 County per capita income in 2000 2004 Presidential election winning party: R or D First, we import the dataset into SAS using Proc Import. In this example we assume that the dataset ProcTTESTdata.xls is stored in the ECN377 folder on the student’s C: drive (the V: drive in TealWare). proc import datafile="v:\ecn377\ProcTTESTdata.xls" dbms=xls out=dataset01 replace; run; At this point in the program, we could run Proc Contents to obtain a summary of the number of variables, number of rows, and variable types in the dataset. We could also use a Data Step to create new variables, modify variables, drop or keep variables, etc. However, in this example, we will simply use the data in the imported dataset without any modifications. The remainder of this handout discusses how to conduct various types of ttests in SAS. Basic, one-sided t-test. Tests whether true population mean (μ1 ) is equal to a given number (μ0 ) or more than the given number. The “sides=U” option tells SAS to test whether μ1 is more than μ0. H0: μ1 = (μ0 = 20000) H1: μ1 > (μ0 = 20000) proc ttest data=dataset01 h0=20000 sides=U alpha=0.05; var pcinc2000; run; Basic, one-sided t-test. Tests whether true population mean (μ1 ) is equal to a given number (μ0 ) or less than the given number. The “sides=L” option tells SAS to test whether μ1 is less than μ0. H0: μ1 = (μ0 = 30000) H1: μ1 < (μ0 = 30000) proc ttest data=dataset01 h0=30000 sides=L alpha=0.05; var pcinc2000; run; 1 UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas Basic, two-sided t-test. Tests whether true population mean (μ1 ) is equal to a given number (μ0 ) or not equal to the given number. The “sides=2” option tells SAS to test whether μ1 is not equal to μ0. H0: μ1 = (μ0 = 25000) H1: μ1 ≠ (μ0 = 25000) proc ttest data=dataset01 h0=25000 sides=2 alpha=0.05; var pcinc2000; run; Independent Samples t-test. Tests for a difference (d) in the population means from two independent populations. The “sides=2” option tells SAS to test whether μ1 is not equal to μ0. H0: (d = μ1 - μ0 ) = 0 H1: d ≠ 0 (two-sided test) For an independent samples t-test, a variable is used to indicate the population to which each observation (data row) belongs; this variable is listed in the “class” statement of the Proc tTest command. In the example below, the class variable is “vote2004.” The class variable is used to separate the data into two populations, and the test determines whether the mean value of the “var” variable is different for the two populations. SAS labels one of the populations “1” and the other “2”; SAS will tell you in the output which population is labeled “1” and which “2”. Before the t-test is conducted, the dataset must be sorted by the class variable using Proc Sort. Because we want to use vote2004 as the class variable in Proc tTest, we use vote2004 as the sort variable in Proc Sort, as shown below. (Note that the test below could be run as a one-sided (H1: d < 0 or H1: d > 0 ) test instead by changing “sides=2” to either “sides=U” or “sides=L”.) proc sort data=dataset01; by vote2004; run; proc ttest data=dataset01 h0=0 sides=2 alpha=0.05; class vote2004; var pcinc2000; run; Dependent (Paired) Samples t-test. Tests for a difference in the means between two sets of observations on the same sample of individuals from one population. Sorting the data is not necessary for this test. In the example below, variable “pcinc2000” gives the values for the first set of observations and “pcinc2005” gives the values for the second set of observations on the same individuals (counties). When calculating the differences for each observation, SAS subtracts the variable listed second in the “paired” statement from the variable listed first in the “paired” statement. The “sides=2” option tells SAS to test whether μ1 is not equal to μ0. (Note: The test below could be run as a one-sided (H1: d < 0 or H1: d > 0 ) test instead by changing “sides=2” to either “sides=U” or “sides=L”.) H0: (d = μ1 - μ0 ) = 0 H1: d ≠ 0 (two-sided test) proc ttest data=dataset01 h0=0 sides=2 alpha=0.05; paired pcinc2005*pcinc2000; run; 2 UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas SAS Output for the five t-tests discussed above (in order). IMPORTANT NOTE: SAS labels the p-value as “Pr > t”. Basic, one-sided t-test. (sides=”U”, this asks whether μ1 > μ0) H0: μ1 = (μ0 = 20000) H1: μ1 > (μ0 = 20000) The SAS System The TTEST Procedure Variable: PCINC2000 N Mean Std Dev Std Err Minimum Maximum 25 24367.5 4887.4 977.5 17272.0 35954.0 Mean 24367.5 95% CL Mean 22695.1 Std Dev Infty 95% CL Std Dev 4887.4 3816.2 DF t Value Pr > t 24 4.47 <.0001 6799.1 Basic, one-sided t-test. (sides=”L”, this asks whether μ1 < μ0) H0: μ1 = (μ0 = 30000) H1: μ1 < (μ0 = 30000) The SAS System The TTEST Procedure Variable: PCINC2000 N Mean Std Dev Std Err Minimum Maximum 25 24367.5 4887.4 977.5 17272.0 35954.0 Mean 24367.5 95% CL Mean -Infty Std Dev 26039.8 4887.4 DF t Value Pr < t 24 -5.76 <.0001 95% CL Std Dev 3816.2 6799.1 3 UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas Basic, two-sided t-test. (tests whether μ1 ≠ μ0) H0: μ1 = (μ0 = 25000) H1: μ1 ≠ (μ0 = 25000) The SAS System The TTEST Procedure Variable: PCINC2000 N Mean Std Dev Std Err Minimum Maximum 25 24367.5 4887.4 977.5 17272.0 35954.0 Mean 24367.5 95% CL Mean 22350.1 Std Dev 26384.9 4887.4 DF t Value Pr > |t| 24 -0.65 0.5237 95% CL Std Dev 3816.2 6799.1 4 UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas Independent Samples t-test. (tests whether d ≠ 0, note that in the output below, SAS’s name for d is “Diff”) (SAS first calculates the mean for each sample, in the output below these are the mean for D and the mean for R. Next, SAS calculates the difference between mean D and mean R, this is shown below as the mean for “Diff (1-2)”. Finally, SAS tests whether the mean for “Diff (1-2)” is different from zero. It is the mean for Diff (1-2) that we are calling lowercase “d” in the hypothesis test notation immediately below.) H0: (d = μ1 - μ0 ) = 0 H1: d ≠ 0 (two-sided test) The SAS System The TTEST Procedure Variable: VOTE2004 D R Diff (1-2) VOTE2004 N Mean Std Dev Std Err Minimum Maximum 6 19 23206.2 24734.2 -1528.0 6012.1 4606.2 4945.9 2454.4 1056.7 2316.1 17272.0 17369.0 30912.0 35954.0 Method D R Diff (1-2) Diff (1-2) PCINC2000 Mean 23206.2 24734.2 -1528.0 -1528.0 Pooled Satterthwaite VOTE2004 D R Diff (1-2) Diff (1-2) 95% CL Mean 16896.9 22514.1 -6319.4 -7854.4 Method Variances Pooled Satterthwaite Equal Unequal 29515.5 26954.3 3263.3 4798.4 6012.1 4606.2 4945.9 95% CL Std Dev 3752.8 3480.5 3844.1 Pooled Satterthwaite Method Std Dev 14745.3 6811.8 6938.0 DF t Value Pr > |t| 23 6.9591 -0.66 -0.57 0.5160 0.5854 Equality of Variances Method Folded F Num DF Den DF F Value Pr > F 5 18 1.70 0.3696 5 UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas Dependent (Paired) Samples t-test. (tests whether d ≠ 0, note that SAS calculates the difference for each observation first, and then takes the mean of all these differences, which is “d”, and then tests whether this “d” is different from zero) H0: (d = μ1 - μ0 ) = 0 H1: d ≠ 0 (two-sided test) The SAS System The TTEST Procedure Difference: PCINC2005 - PCINC2000 N Mean Std Dev Std Err Minimum Maximum 25 3546.2 1553.7 310.7 1370.0 8468.0 Mean 3546.2 95% CL Mean 2904.9 Std Dev 4187.5 DF 24 t Value 11.41 1553.7 95% CL Std Dev 1213.2 2161.4 Pr > |t| <.0001 !!! NOTE THE DIFFERENCE BETWEEN THE INDEPENDENT AND DEPENDENT SAMPLES T-TESTS !!! In the Independent Samples t-test, each sample is composed of completely different (independent) individuals. We calculate the mean of each sample first, then take the difference of the two means, and then test whether the difference is zero. That is, “take means first, then find the difference.” In the Dependent Samples t-test, each sample is composed of exactly the same (dependent) individuals. We calculate the difference for each individual first, then take the mean of all the differences, and then test whether the mean is zero. That is, “take differences first, then find the mean.” 6