T-tests in SAS

advertisement
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
t-Tests in SAS
In this handout we conduct t-tests in SAS using an example dataset “ProcTTESTdata.xls”. This dataset file
contains data on 7 variables for a random sample of 25 counties drawn from the population of all 100 North
Carolina counties. The 7 variables are:
Variable Name
CNTYNAME
REGION
SQMILES
POP2006
PCINC2005
PCINC2000
VOTE2004
Variable Type
Text Variable
Text Variable
Numerical Measurement Variable
Numerical Measurement Variable
Numerical Measurement Variable
Numerical Measurement Variable
Text Variable
Variable Definition
Name of county
Geographic region: M, P or C
Area of county in square miles
County population in 2006
County per capita income in 2005
County per capita income in 2000
2004 Presidential election winning party: R or D
First, we import the dataset into SAS using Proc Import. In this example we assume that the dataset
ProcTTESTdata.xls is stored in the ECN377 folder on the student’s C: drive (the V: drive in TealWare).
proc import datafile="v:\ecn377\ProcTTESTdata.xls" dbms=xls out=dataset01
replace;
run;
At this point in the program, we could run Proc Contents to obtain a summary of the number of variables, number
of rows, and variable types in the dataset. We could also use a Data Step to create new variables, modify
variables, drop or keep variables, etc. However, in this example, we will simply use the data in the imported
dataset without any modifications. The remainder of this handout discusses how to conduct various types of ttests in SAS.
Basic, one-sided t-test. Tests whether true population mean (μ1 ) is equal to a given number (μ0 ) or more than
the given number. The “sides=U” option tells SAS to test whether μ1 is more than μ0.
H0: μ1 = (μ0 = 20000)
H1: μ1 > (μ0 = 20000)
proc ttest data=dataset01 h0=20000 sides=U alpha=0.05;
var pcinc2000;
run;
Basic, one-sided t-test. Tests whether true population mean (μ1 ) is equal to a given number (μ0 ) or less than the
given number. The “sides=L” option tells SAS to test whether μ1 is less than μ0.
H0: μ1 = (μ0 = 30000)
H1: μ1 < (μ0 = 30000)
proc ttest data=dataset01 h0=30000 sides=L alpha=0.05;
var pcinc2000;
run;
1
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
Basic, two-sided t-test. Tests whether true population mean (μ1 ) is equal to a given number (μ0 ) or not equal to
the given number. The “sides=2” option tells SAS to test whether μ1 is not equal to μ0.
H0: μ1 = (μ0 = 25000)
H1: μ1 ≠ (μ0 = 25000)
proc ttest data=dataset01 h0=25000 sides=2 alpha=0.05;
var pcinc2000;
run;
Independent Samples t-test. Tests for a difference (d) in the population means from two independent
populations. The “sides=2” option tells SAS to test whether μ1 is not equal to μ0.
H0: (d = μ1 - μ0 ) = 0
H1: d ≠ 0 (two-sided test)
For an independent samples t-test, a variable is used to indicate the population to which each observation (data
row) belongs; this variable is listed in the “class” statement of the Proc tTest command. In the example below,
the class variable is “vote2004.” The class variable is used to separate the data into two populations, and the test
determines whether the mean value of the “var” variable is different for the two populations. SAS labels one of
the populations “1” and the other “2”; SAS will tell you in the output which population is labeled “1” and which
“2”. Before the t-test is conducted, the dataset must be sorted by the class variable using Proc Sort. Because we
want to use vote2004 as the class variable in Proc tTest, we use vote2004 as the sort variable in Proc Sort, as
shown below. (Note that the test below could be run as a one-sided (H1: d < 0 or H1: d > 0 ) test instead by
changing “sides=2” to either “sides=U” or “sides=L”.)
proc sort data=dataset01;
by vote2004;
run;
proc ttest data=dataset01 h0=0 sides=2 alpha=0.05;
class vote2004;
var pcinc2000;
run;
Dependent (Paired) Samples t-test. Tests for a difference in the means between two sets of observations on the
same sample of individuals from one population. Sorting the data is not necessary for this test. In the example
below, variable “pcinc2000” gives the values for the first set of observations and “pcinc2005” gives the values for
the second set of observations on the same individuals (counties). When calculating the differences for each
observation, SAS subtracts the variable listed second in the “paired” statement from the variable listed first in the
“paired” statement. The “sides=2” option tells SAS to test whether μ1 is not equal to μ0. (Note: The test below
could be run as a one-sided (H1: d < 0 or H1: d > 0 ) test instead by changing “sides=2” to either “sides=U” or
“sides=L”.)
H0: (d = μ1 - μ0 ) = 0
H1: d ≠ 0 (two-sided test)
proc ttest data=dataset01 h0=0 sides=2 alpha=0.05;
paired pcinc2005*pcinc2000;
run;
2
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
SAS Output for the five t-tests discussed above (in order).
IMPORTANT NOTE: SAS labels the p-value as “Pr > t”.
Basic, one-sided t-test. (sides=”U”, this asks whether μ1 > μ0)
H0: μ1 = (μ0 = 20000)
H1: μ1 > (μ0 = 20000)
The SAS System
The TTEST Procedure
Variable:
PCINC2000
N
Mean
Std Dev
Std Err
Minimum
Maximum
25
24367.5
4887.4
977.5
17272.0
35954.0
Mean
24367.5
95% CL Mean
22695.1
Std Dev
Infty
95% CL Std Dev
4887.4
3816.2
DF
t Value
Pr > t
24
4.47
<.0001
6799.1
Basic, one-sided t-test. (sides=”L”, this asks whether μ1 < μ0)
H0: μ1 = (μ0 = 30000)
H1: μ1 < (μ0 = 30000)
The SAS System
The TTEST Procedure
Variable:
PCINC2000
N
Mean
Std Dev
Std Err
Minimum
Maximum
25
24367.5
4887.4
977.5
17272.0
35954.0
Mean
24367.5
95% CL Mean
-Infty
Std Dev
26039.8
4887.4
DF
t Value
Pr < t
24
-5.76
<.0001
95% CL Std Dev
3816.2
6799.1
3
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
Basic, two-sided t-test. (tests whether μ1 ≠ μ0)
H0: μ1 = (μ0 = 25000)
H1: μ1 ≠ (μ0 = 25000)
The SAS System
The TTEST Procedure
Variable:
PCINC2000
N
Mean
Std Dev
Std Err
Minimum
Maximum
25
24367.5
4887.4
977.5
17272.0
35954.0
Mean
24367.5
95% CL Mean
22350.1
Std Dev
26384.9
4887.4
DF
t Value
Pr > |t|
24
-0.65
0.5237
95% CL Std Dev
3816.2
6799.1
4
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
Independent Samples t-test. (tests whether d ≠ 0, note that in the output below, SAS’s name for d is “Diff”)
(SAS first calculates the mean for each sample, in the output below these are the mean for D and the mean
for R. Next, SAS calculates the difference between mean D and mean R, this is shown below as the mean
for “Diff (1-2)”. Finally, SAS tests whether the mean for “Diff (1-2)” is different from zero. It is the mean
for Diff (1-2) that we are calling lowercase “d” in the hypothesis test notation immediately below.)
H0: (d = μ1 - μ0 ) = 0
H1: d ≠ 0 (two-sided test)
The SAS System
The TTEST Procedure
Variable:
VOTE2004
D
R
Diff (1-2)
VOTE2004
N
Mean
Std Dev
Std Err
Minimum
Maximum
6
19
23206.2
24734.2
-1528.0
6012.1
4606.2
4945.9
2454.4
1056.7
2316.1
17272.0
17369.0
30912.0
35954.0
Method
D
R
Diff (1-2)
Diff (1-2)
PCINC2000
Mean
23206.2
24734.2
-1528.0
-1528.0
Pooled
Satterthwaite
VOTE2004
D
R
Diff (1-2)
Diff (1-2)
95% CL Mean
16896.9
22514.1
-6319.4
-7854.4
Method
Variances
Pooled
Satterthwaite
Equal
Unequal
29515.5
26954.3
3263.3
4798.4
6012.1
4606.2
4945.9
95% CL Std Dev
3752.8
3480.5
3844.1
Pooled
Satterthwaite
Method
Std Dev
14745.3
6811.8
6938.0
DF
t Value
Pr > |t|
23
6.9591
-0.66
-0.57
0.5160
0.5854
Equality of Variances
Method
Folded F
Num DF
Den DF
F Value
Pr > F
5
18
1.70
0.3696
5
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
Dependent (Paired) Samples t-test. (tests whether d ≠ 0, note that SAS calculates the difference for each
observation first, and then takes the mean of all these differences, which is “d”, and then tests whether this
“d” is different from zero)
H0: (d = μ1 - μ0 ) = 0
H1: d ≠ 0 (two-sided test)
The SAS System
The TTEST Procedure
Difference:
PCINC2005 - PCINC2000
N
Mean
Std Dev
Std Err
Minimum
Maximum
25
3546.2
1553.7
310.7
1370.0
8468.0
Mean
3546.2
95% CL Mean
2904.9
Std Dev
4187.5
DF
24
t Value
11.41
1553.7
95% CL Std Dev
1213.2
2161.4
Pr > |t|
<.0001
!!! NOTE THE DIFFERENCE BETWEEN THE INDEPENDENT AND DEPENDENT SAMPLES T-TESTS !!!
In the Independent Samples t-test, each sample is composed of completely different (independent) individuals. We
calculate the mean of each sample first, then take the difference of the two means, and then test whether the difference
is zero. That is, “take means first, then find the difference.”
In the Dependent Samples t-test, each sample is composed of exactly the same (dependent) individuals. We calculate
the difference for each individual first, then take the mean of all the differences, and then test whether the mean is
zero. That is, “take differences first, then find the mean.”
6
Download