T Tests, ANOVA, and Regression Analysis Here is a one-sample t test of the null hypothesis that mu = 0: DATA ONESAMPLE; INPUT Y @@; CARDS; 1 2 3 4 5 6 7 8 9 10 PROC MEANS T PRT; RUN; -----------------------------------------------------------------------------------------------The SAS System The MEANS Procedure Analysis Variable : Y t Value Pr > |t| ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 5.74 0.0003 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ ------------------------------------------------------------------------------------------------ Now an ANOVA on the same data but with no grouping variable: PROC ANOVA; MODEL Y = ; run; -----------------------------------------------------------------------------------------------The SAS System The ANOVA Procedure Dependent Variable: Y Source DF Sum of Squares Mean Square F Value Pr > F Model 1 302.5000000 302.5000000 33.00 0.0003 Error 9 82.5000000 9.1666667 10 385.0000000 Uncorrected Total R-Square Coeff Var Root MSE Y Mean 0.000000 55.04819 3.027650 5.500000 Source Intercept DF Anova SS Mean Square F Value Pr > F 1 302.5000000 302.5000000 33.00 0.0003 ------------------------------------------------------------------------------------------------ Notice that the ANOVA F is simply the square of the one-sample t, and the onetailed p from the ANOVA is identical to the two-tailed p from the t. Now an Regression analysis with Model Y = intercept + error. PROC REG; MODEL Y = ; run; -----------------------------------------------------------------------------------------------The REG Procedure Model: MODEL1 Dependent Variable: Y Sum of Mean Source DF Squares Square 0 9 9 0 82.50000 82.50000 . 9.16667 Root MSE Dependent Mean Coeff Var 3.02765 5.50000 55.04819 Model Error Corrected Total F Value . R-Square Adj R-Sq Pr > F . 0.0000 0.0000 Parameter Estimates Variable DF Parameter Estimate Standard Error t Value Pr > |t| Intercept 1 5.50000 0.95743 5.74 0.0003 ------------------------------------------------------------------------------------------------ Notice that the ANOVA is replicated. Now consider a two independent groups t test with pooled variances, null is mu1-mu2 = 0: DATA TWOSAMPLE; INPUT X Y @@; CARDS; 1 1 1 2 1 3 1 4 1 5 2 6 2 7 2 8 2 9 2 10 PROC TTEST; CLASS X; VAR Y; RUN; -----------------------------------------------------------------------------------------------The SAS System T-Tests Variable Method Variances DF t Value Pr > |t| Y Pooled Equal 8 -5.00 0.0011 ------------------------------------------------------------------------------------------------ Now an ANOVA on the same data: PROC ANOVA; CLASS X; MODEL Y = X; RUN; -----------------------------------------------------------------------------------------------The ANOVA Procedure Dependent Variable: Y Source DF Sum of Squares Mean Square F Value Pr > F Model 1 62.50000000 62.50000000 25.00 0.0011 Error 8 20.00000000 2.50000000 Corrected Total 9 82.50000000 Source X R-Square Coeff Var Root MSE Y Mean 0.757576 28.74798 1.581139 5.500000 DF Anova SS Mean Square F Value Pr > F 1 62.50000000 62.50000000 25.00 0.0011 ------------------------------------------------------------------------------------------------ Notice that the ANOVA F is simply the square of the independent samples t and the one-tailed ANOVA p identical to the two-tailed p from t. And finally replication of the ANOVA with a regression analysis: PROC REG; MODEL Y = X; run; -----------------------------------------------------------------------------------------------The SAS System The REG Procedure Model: MODEL1 Dependent Variable: Y Number of Observations Read Number of Observations Used 10 10 Analysis of Variance DF Sum of Squares Mean Square 1 8 9 62.50000 20.00000 82.50000 62.50000 2.50000 Root MSE Dependent Mean Coeff Var 1.58114 5.50000 28.74798 Source Model Error Corrected Total R-Square Adj R-Sq F Value Pr > F 25.00 0.0011 0.7576 0.7273 Parameter Estimates Variable DF Parameter Estimate Standard Error t Value Pr > |t| Intercept X 1 1 -2.00000 5.00000 1.58114 1.00000 -1.26 5.00 0.2415 0.0011 OK, but what if we have more than two groups? Show me that the ANOVA is a regression analysis in that case. Here is the SAS program, with data: data Lotus; input Dose N; Do I=1 to N; Input Illness @@; output; end; cards; 0 20 101 101 101 104 104 105 110 111 111 113 114 79 89 91 94 95 96 99 99 99 10 20 100 65 65 67 68 80 81 82 85 87 87 88 88 91 92 94 95 94 96 96 20 20 64 75 75 76 77 79 79 80 80 81 81 81 82 83 83 85 87 88 90 96 30 20 100 105 108 80 82 85 87 87 87 89 90 90 92 92 92 95 95 97 98 99 40 20 101 102 102 105 108 109 112 119 119 123 82 89 92 94 94 95 95 97 98 99 ***************************************************************************** ; proc GLM data=Lotus; class Dose; model Illness = Dose / ss1; title 'Here we have a traditional one-way independent samples ANOVA'; run; ***************************************************************************** ; data Polynomial; set Lotus; Quadratic=Dose*Dose; Cubic=Dose**3; Quartic=Dose**4; proc GLM data=Polynomial; model Illness = Dose Quadratic Cubic Quartic / ss1; title 'Here we have a polynomial regression analysis.'; run; ***************************************************************************** Here is the output: Here we have a traditional one-way independent samples ANOVA 2 The GLM Procedure Dependent Variable: Illness Sum of Source DF Squares Mean Square F Value Pr > F Model 4 6791.54000 1697.88500 20.78 <.0001 Error 95 7762.70000 81.71263 Corrected Total 99 14554.24000 Source Dose R-Square Coeff Var Root MSE Illness Mean 0.466637 9.799983 9.039504 92.24000 DF Type I SS Mean Square F Value Pr > F 4 6791.540000 1697.885000 20.78 <.0001 ------------------------------------------------------------------------------------------------ Here we have a polynomial regression analysis. 3 The GLM Procedure Number of observations 100 ------------------------------------------------------------------------------------------------ Here we have a polynomial regression analysis. 4 The GLM Procedure Dependent Variable: Illness Sum of Source DF Squares Mean Square F Value Pr > F Model 4 6791.54000 1697.88500 20.78 <.0001 Error 95 7762.70000 81.71263 Corrected Total 99 14554.24000 Note that the polynomial regression produced exactly the same F, p, SS, MS, as the traditional ANOVA. Source R-Square Coeff Var Root MSE Illness Mean 0.466637 9.799983 9.039504 92.24000 DF Type I SS Mean Square F Value Pr > F Dose 1 174.845000 174.845000 2.14 0.1468 Quadratic 1 6100.889286 6100.889286 74.66 <.0001 Cubic 1 389.205000 389.205000 4.76 0.0315 Quartic 1 126.600714 126.600714 1.55 0.2163 ------------------------------------------------------------------------------------------------ Return to Wuensch’s Stats Lessons Page November, 2006