T Tests, ANOVA, and Regression Analysis

advertisement
T Tests, ANOVA, and Regression Analysis
Here is a one-sample t test of the null hypothesis that mu = 0:
DATA ONESAMPLE; INPUT Y @@;
CARDS;
1 2 3 4 5 6 7 8 9 10
PROC MEANS T PRT; RUN;
-----------------------------------------------------------------------------------------------The SAS System
The MEANS Procedure
Analysis Variable : Y
t Value
Pr > |t|
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
5.74
0.0003
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
------------------------------------------------------------------------------------------------
Now an ANOVA on the same data but with no grouping variable:
PROC ANOVA; MODEL Y = ; run;
-----------------------------------------------------------------------------------------------The SAS System
The ANOVA Procedure
Dependent Variable: Y
Source
DF
Sum of
Squares
Mean Square
F Value
Pr > F
Model
1
302.5000000
302.5000000
33.00
0.0003
Error
9
82.5000000
9.1666667
10
385.0000000
Uncorrected Total
R-Square
Coeff Var
Root MSE
Y Mean
0.000000
55.04819
3.027650
5.500000
Source
Intercept
DF
Anova SS
Mean Square
F Value
Pr > F
1
302.5000000
302.5000000
33.00
0.0003
------------------------------------------------------------------------------------------------
Notice that the ANOVA F is simply the square of the one-sample t, and the onetailed p from the ANOVA is identical to the two-tailed p from the t.
Now an Regression analysis with Model Y = intercept + error.
PROC REG; MODEL Y = ; run;
-----------------------------------------------------------------------------------------------The REG Procedure
Model: MODEL1
Dependent Variable: Y
Sum of
Mean
Source
DF
Squares
Square
0
9
9
0
82.50000
82.50000
.
9.16667
Root MSE
Dependent Mean
Coeff Var
3.02765
5.50000
55.04819
Model
Error
Corrected Total
F Value
.
R-Square
Adj R-Sq
Pr > F
.
0.0000
0.0000
Parameter Estimates
Variable
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
Intercept
1
5.50000
0.95743
5.74
0.0003
------------------------------------------------------------------------------------------------
Notice that the ANOVA is replicated.
Now consider a two independent groups t test with pooled variances, null is
mu1-mu2 = 0:
DATA TWOSAMPLE; INPUT X Y @@;
CARDS;
1 1 1 2 1 3 1 4 1 5
2 6 2 7 2 8 2 9 2 10
PROC TTEST; CLASS X; VAR Y; RUN;
-----------------------------------------------------------------------------------------------The SAS System
T-Tests
Variable
Method
Variances
DF
t Value
Pr > |t|
Y
Pooled
Equal
8
-5.00
0.0011
------------------------------------------------------------------------------------------------
Now an ANOVA on the same data:
PROC ANOVA; CLASS X; MODEL Y = X; RUN;
-----------------------------------------------------------------------------------------------The ANOVA Procedure
Dependent Variable: Y
Source
DF
Sum of
Squares
Mean Square
F Value
Pr > F
Model
1
62.50000000
62.50000000
25.00
0.0011
Error
8
20.00000000
2.50000000
Corrected Total
9
82.50000000
Source
X
R-Square
Coeff Var
Root MSE
Y Mean
0.757576
28.74798
1.581139
5.500000
DF
Anova SS
Mean Square
F Value
Pr > F
1
62.50000000
62.50000000
25.00
0.0011
------------------------------------------------------------------------------------------------
Notice that the ANOVA F is simply the square of the independent samples t and
the one-tailed ANOVA p identical to the two-tailed p from t.
And finally replication of the ANOVA with a regression analysis:
PROC REG; MODEL Y = X; run;
-----------------------------------------------------------------------------------------------The SAS System
The REG Procedure
Model: MODEL1
Dependent Variable: Y
Number of Observations Read
Number of Observations Used
10
10
Analysis of Variance
DF
Sum of
Squares
Mean
Square
1
8
9
62.50000
20.00000
82.50000
62.50000
2.50000
Root MSE
Dependent Mean
Coeff Var
1.58114
5.50000
28.74798
Source
Model
Error
Corrected Total
R-Square
Adj R-Sq
F Value
Pr > F
25.00
0.0011
0.7576
0.7273
Parameter Estimates
Variable
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
Intercept
X
1
1
-2.00000
5.00000
1.58114
1.00000
-1.26
5.00
0.2415
0.0011
OK, but what if we have more than two groups? Show me that the ANOVA is a
regression analysis in that case.
Here is the SAS program, with data:
data Lotus;
input Dose N; Do I=1 to N; Input Illness @@; output; end;
cards;
0 20
101 101 101 104 104 105 110 111 111 113 114 79 89 91 94 95 96 99 99 99
10 20
100 65 65 67 68 80 81 82 85 87 87 88 88 91 92 94 95 94 96 96
20 20
64 75 75 76 77 79 79 80 80 81 81 81 82 83 83 85 87 88 90 96
30 20
100 105 108 80 82 85 87 87 87 89 90 90 92 92 92 95 95 97 98 99
40 20
101 102 102 105 108 109 112 119 119 123 82 89 92 94 94 95 95 97 98 99
*****************************************************************************
;
proc GLM data=Lotus; class Dose;
model Illness = Dose / ss1;
title 'Here we have a traditional one-way independent samples ANOVA'; run;
*****************************************************************************
;
data Polynomial; set Lotus; Quadratic=Dose*Dose; Cubic=Dose**3;
Quartic=Dose**4;
proc GLM data=Polynomial; model Illness = Dose Quadratic Cubic Quartic / ss1;
title 'Here we have a polynomial regression analysis.'; run;
*****************************************************************************
Here is the output:
Here we have a traditional one-way independent samples ANOVA
2
The GLM Procedure
Dependent Variable: Illness
Sum of
Source
DF
Squares
Mean Square
F Value
Pr > F
Model
4
6791.54000
1697.88500
20.78
<.0001
Error
95
7762.70000
81.71263
Corrected Total
99
14554.24000
Source
Dose
R-Square
Coeff Var
Root MSE
Illness Mean
0.466637
9.799983
9.039504
92.24000
DF
Type I SS
Mean Square
F Value
Pr > F
4
6791.540000
1697.885000
20.78
<.0001
------------------------------------------------------------------------------------------------
Here we have a polynomial regression analysis.
3
The GLM Procedure
Number of observations
100
------------------------------------------------------------------------------------------------
Here we have a polynomial regression analysis.
4
The GLM Procedure
Dependent Variable: Illness
Sum of
Source
DF
Squares
Mean Square
F Value
Pr > F
Model
4
6791.54000
1697.88500
20.78
<.0001
Error
95
7762.70000
81.71263
Corrected Total
99
14554.24000
Note that the polynomial regression produced exactly the same F, p, SS, MS, as the traditional
ANOVA.
Source
R-Square
Coeff Var
Root MSE
Illness Mean
0.466637
9.799983
9.039504
92.24000
DF
Type I SS
Mean Square
F Value
Pr > F
Dose
1
174.845000
174.845000
2.14
0.1468
Quadratic
1
6100.889286
6100.889286
74.66
<.0001
Cubic
1
389.205000
389.205000
4.76
0.0315
Quartic
1
126.600714
126.600714
1.55
0.2163
------------------------------------------------------------------------------------------------
Return to Wuensch’s Stats Lessons Page
November, 2006
Download