Doing ANOVA and t-tests LISA short course by Ciro Velasco-Cruz October 21, 2008 ONE SAMPLE t TEST Example In a study, 15 lobsters were randomly selected from recent catches along a certain region of the Maine shore line. The lobsters were weighed to the nearest ounce, with results: 26 14 18 13 22 15 24 21 29 10 12 31 19 16 21 Suppose that for research purposes it is needed that the mean lobster’s weight equal to 15 ounces. It is known that lobster weight is normally distributed with both mean and standard deviation unknown. SAS for coding The data step data lobsters_w; input type weigth @@; datalines; 1 26 1 14 1 18 1 13 1 22 1 15 1 24 1 21 1 29 1 10 1 12 1 31 1 19 1 16 1 21 ; SAS for coding Exploratory data analysis: proc means data=lobsters_w mean std max min median; var weigth; run; proc boxplot data=lobsters_w; title'BoxPlot for one sample t-test example'; plot (weigth)*type/ cframe = vligb cboxes = dagr cboxfill = ywh; inset mean max min /CFILL = WHITE header = "Summary" CTEXT = RED; run; SAS OUTPUT The SAS System The MEANS Procedure Analysis Variable : weigth Mean Std Dev Maximum Minimum Median 19.4000000 6.2655521 31.0000000 10.0000000 19.0000000 SAS OUTPUT SAS coding Data analysis: proc ttest data=lobsters_w h0=15; title 'One sample t test example'; var weigth; run; SAS OUTPUT One sample t test example The TTEST Procedure Statistics Variable N weigth 1 5 Lower CL Mean Mean Upper CL Mean Lower CL Std Dev Std Dev Upper CL Std Dev Std Err Minimum Maximum 15.93 19.4 22.87 4.5872 6.2656 9.8814 1.6178 10 31 T-Tests Variable weigth DF t Value Pr > |t| 14 2.72 0.0166 Conclusion: Since the p-value is <0.05, we reject the Null Hypothesis, that the mean=15, at 5% of level of significance. Two Sample t-test example An animal scientist is interested in comparing two different topical treatments (A, B) against osteoarthritis in the leg joints of horses. Seven horses with the illness are available at the animal clinic. For each horse it is randomly determined which of the front legs receives treatment A and which treatment B. After four weeks of treat., the horses’ mobility is measured. Assuming that they were two independent samples, we can perform our tests. SAS data step data horses; input trt horse mobility @@ ; cards; 1 1 48.2 1 2 44.6 1 3 49.7 1 4 40.5 1 5 54.6 1 6 47.1 1 7 46.8 2 1 41.5 2 2 40.1 2 3 44.0 2 4 41.2 2 5 49.8 2 6 41.7 2 7 51.4 ; SAS E.D.A. proc means data=horses mean std max min median; class trt; var mobility; run; proc boxplot data=horses; title'BoxPlot for two sample t-test example'; plot (mobility)*trt/ cframe = vligb cboxes = dagr cboxfill = ywh; insetgroup mean max min q1 q2 q3/header = 'Summary by Treatme ctext = red; run; SAS OUTPUT The MEANS Procedure Analysis Variable : mobility trt N Obs Mean Std Dev Maximum Minimum Median 1 7 47.3571429 4.3523393 54.6000000 40.5000000 47.1000000 2 7 44.2428571 4.5199031 51.4000000 40.1000000 41.7000000 SAS OUTPUT SAS t test proc ttest data=horses; title 'Two sample t test example'; class trt; var mobility; run; SAS OUTPUT Two sample t test example The TTEST Procedure Statistics Variable trt N Lower CL Mean Upper CL Mean Lower CL Std Dev Std Dev Upper CL Std Dev Std Err Minimum Maximum mobility 1 7 43.332 47.35 7 51.382 2.8046 4.3523 9.5841 1.645 40.5 54.6 mobility 2 7 40.063 44.24 3 48.423 2.9126 4.5199 9.9531 1.7084 40.1 51.4 mobility Diff (1-2) -2.053 3.114 3 8.2816 3.1816 4.4369 7.3242 2.3716 Mean T-Tests Variable Method Variances DF t Value Pr > |t| mobility Pooled Equal 12 1.31 0.2137 mobility Satterthwaite Unequal 12 1.31 0.2137 Equality of Variances Variable Method mobility Folded F Num DF Den DF F Value Pr > F 6 6 1.08 0.9293 Conclusion • About Variance: Since the p-value is larger than 5%, we conclude that the variances are indeed equal. • About means: Since p-value for this test is larger to 5% too, we conclude that the means are equal. Paired t test example • Let’s consider the last example. Since treatment A and B were both measured on the same horse. Measurements of mobility are not independent within horses. Then the right way to analyze the data is by Paired t test. • Idea: we look at the difference between the response from trts A and B: Di=YiA-YiB SAS paired test proc ttest data=newhorses; paired MobilityA*MobilityB; run; The SAS System The TTEST Procedure Statistics Difference N Lower CL Mean Mea n Upper CL Mean Lower CL Std Dev Std Dev Upper CL Std Dev Std Err Minimum Maximum MobilityA - MobilityB 7 -0.729 3.114 3 6.9571 2.6775 4.1551 9.1498 1.5705 -4.6 6.7 T-Tests Difference MobilityA - MobilityB DF t Value Pr > |t| 6 1.98 0.0946 But why is it happeing? One Way Anova An experiment was conducted to study the growth of plant tissue in the presence of hormone solutions containing various growth inhibiting substances. For each solution, 10 independent tissues cultures were prepared and the growth of the plant tissue was recorded in mm. This experiment has One factor and 5 levels. Each has 10 replications. SAS data step data peasection; input trtmnt growth @@; label trtmnt= 1:'Control' 2:'Sol.1' 3:'Sol.2' 4:'Mixture' 5:'Sol.3'; datalines; 1 7.841 8.691 8.11 1 1 7.691 7.981 7.641 2 6.782 6.692 6.952 2 6.692 6.722 6.572 3 6.793 6.793 6.793 3 6.693 6.573 6.493 4 6.644 6.574 6.784 4 6.364 6.674 6.264 5 7.315 7.655 7.265 5 7.465 7.325 7.135 ; 8.351 8.571 6.642 6.672 6.613 7.053 6.484 6.674 7.395 7.075 7.74 8.32 6.41 7.07 6.43 6.72 6.54 6.68 6.98 7.25 SAS coding proc boxplot data=peasection; title'BoxPlot for one-way ANOVA example'; plot growth*trtmnt/ cframe = vligb cboxes = dagr cboxfill = ywh; insetgroup mean stddev q1 q2 q3/header = 'Summary by Treatment' ctext = red; run; SAS output SAS glm anyway proc glm data=peasection; class trtmnt; model growth=trtmnt; lsmeans trtmnt /pdiff adjust=tukey ; contrast 'our first contrast with contrast' trtmnt -1 0-1 0 2; estimate 'our first contrast with estimate' trtmnt -1 0-1 0 2; output out=residuals p=yhat r=res; run; SAS output The GLM Procedure Dependent Variable: growth Source DF Sum of Squares Mean Square F Value Pr > F 4 16.11827200 4.02956800 74.32 <.0001 Error 45 2.43972000 0.05421600 Corrected Total 49 18.55799200 Model Source DF Type I SS Mean Square F Value Pr > F trtmnt 4 16.11827200 4.02956800 74.32 <.0001 Source DF Type III SS Mean Square F Value Pr > F trtmnt 4 16.11827200 4.02956800 74.32 <.0001 SAS output trtmnt growth LSMEAN LSMEAN Number 1 8.09300000 1 2 6.71900000 2 3 6.69300000 3 4 6.56500000 4 5 7.28200000 5 Least Squares Means for effect trtmnt Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: growth i/j 1 1 2 3 4 5 <.0001 <.0001 <.0001 <.0001 0.9991 0.5812 <.0001 0.7346 <.0001 2 <.0001 3 <.0001 0.9991 4 <.0001 0.5812 0.7346 5 <.0001 <.0001 <.0001 Contrast <.0001 <.0001 DF Contrast SS Mean Square F Value Pr > F 1 0.08214000 0.08214000 1.52 0.2248 our first contrast with contrast Parameter our first contrast with estimate Estimate Standard Error t Value Pr > |t| -0.22200000 0.18035964 -1.23 0.2248 Note that: -(8.093+6.693)+2*7.282= -.222 Remedies • Transform the response: Log(var(y))=Co+q*log(mean) 1. g(y)=y^(1-q/2) if q different to 2 2. g(y)=log(y) q=2 and y>0 3. g(y)=log(y+shift) q=2 if some y <=0 • Use analysis for Gaussian data with unequal variances: Satterthwaite’s approximation or Welch (for one-way anova) SAS E.D.A. proc means data=peasection noprint; var growth; by trtmnt; output out=varmeans var= vargro mean=meangro; run; data varmeans;set varmeans; vargro=log(vargro);meangro=log(meangro); proc gplot data=varmeans; plot vargro*meangro; run; proc reg data=varmeans; model vargro=meangro; run; SAS output SAS regression The REG Procedure Model: MODEL1 Dependent Variable: vargro Root MSE 0.24863 R-Square 0.8990 Dependent Mean -3.14165 Adj R-Sq 0.8654 Coeff Var -7.91405 Parameter Estimates DF Parameter Estimate Standard Error t Value Pr > |t| Intercept 1 -17.58795 2.79721 -6.29 0.0081 meangro 1 7.39762 1.43125 5.17 0.0141 Variable SAS trans. And analysis code data trans; set peasection; yt=growth**-2.69881; ; proc glm data=trans; class trtmnt; model yt=trtmnt; means trtmnt /hovtest=levene(type=square); output out=resi r=res; run; proc boxplot data=resi; title'BoxPlot for one-way ANOVA example'; plot res*trtmnt/ cframe = vligb cboxes = dagr cboxfill = ywh; insetgroup mean stddev q1 q2 q3/header = 'Summary by Treatment' ctext = red; run; SAS output The GLM Procedure Dependent Variable: yt Source DF Sum of Squares Mean Square F Value Pr > F 4 0.00004922 0.00001231 72.52 <.0001 Error 45 0.00000764 0.00000017 Corrected Total 49 0.00005686 Model Source DF Type I SS Mean Square F Value Pr > F trtmnt 4 0.00004922 0.00001231 72.52 <.0001 Source DF Type III SS Mean Square F Value Pr > F trtmnt 4 0.00004922 0.00001231 72.52 <.0001 Levene's Test for Homogeneity of yt Variance ANOVA of Squared Deviations from Group Means Source DF Sum of Squares Mean Square F Value Pr > F trtmnt 4 3.2E-14 8E-15 0.21 0.9297 45 1.69E-12 3.75E-14 Error SAS output Two-way ANOVA fixed factors An educational researcher was interested in the factors noise and solitude as they affect study conditions. Each subject in an experiment was asked to study an essay on American history for 15 minutes and then was tested on a 25 item quiz, the number of correct items being the score. The subjects differed, however, in the conditions under which they were allowed to study Factor Solitude with 2 levels: Alone and not alone (w/stooge) Factor Noise with 3 levels: no noise, soft background music, and loud rock and roll music. There are 3 replication of each treatment combination. SAS data step data QuizScores; input Solitude $ Noise $ Score @@; datalines; Alone None 10 Alone None 6 Alone None 14 Alone Soft 21 Alone Soft 21 Alone Soft 16 Alone Loud 5 Alone Loud 15 Alone Loud 7 Stooge None 6 Stooge None 11 Stooge None 1 Stooge Soft 6 Stooge Soft 17 Stooge Soft 13 Stooge Loud 1 Stooge Loud 2 Stooge Loud 6 ; SAS E.D.A proc boxplot data=quizscores; title'BoxPlot for two-way ANOVA example'; plot score*noise(solitude)/ cframe = vligb cboxes = dagr cboxfill = ywh; *inset mean max min/pos=tm header='The overall summary'; insetgroup mean stddev q1 q2 q3/header = 'Summary by Treatment' ctext = red; run; proc means data=quizscores noprint; by solitude noise; var score; output out=meanquizscore mean=meanquiz; run; symbol i=j; symbol2 i=j; proc gplot data=meanquizscore; plot meanquiz*Noise=solitude; plot meanquiz*solitude=noise; run; SAS output SAS output SAS output SAS output proc glm data=quizscores; class solitude noise; model score=solitude|noise; run; The GLM Procedure Dependent Variable: Score Source DF Sum of Squares Mean Square F Value Pr > F 5 471.1111111 94.2222222 4.90 0.0113 Error 12 230.6666667 19.2222222 Corrected Total 17 701.7777778 Model Source DF Type I SS Mean Square F Value Pr > F Solitude 1 150.2222222 150.2222222 7.82 0.0162 Noise 2 312.4444444 156.2222222 8.13 0.0059 Solitude*Noise 2 8.4444444 4.2222222 0.22 0.8060 Source DF Type III SS Mean Square F Value Pr > F Solitude 1 150.2222222 150.2222222 7.82 0.0162 Noise 2 312.4444444 156.2222222 8.13 0.0059 Solitude*Noise 2 8.4444444 4.2222222 0.22 0.8060 Slices • On this example interaction was not significant. But what we should do if it were? There are a way to come out with this problem: SLICES. Since main effects could be either significant or not at the presence of interaction, we need to test how they change at a given level of a treatment. In SAS, we use the following statement to obtain the slices: lsmeans “interaction”/slice=treatment; SAS two way ANOVA random factor An experiment was performed to examine the effect of time Aging on the strength of cement. From a large number of mixes three cement mixes were randomly selected and six specimens were produced form each mix. After two days three randomly selected specimens from each mix were tested for strength with a load test and the other three specimens were tested after seven days. This is a two-way classification with factor Cement Mix (three levels) and Time (2 levels) The levels of factor Time were predetermined. The three levels of cement mixes were randomly selected from a large number of mixes, thus Cement Mix factor is Random. SAS data input data YieldLoads; input Aging $ Mix Load @@; datalines; 2-Days 1 574 2-Days 1 564 2-Days 1 550 2-Days 2 524 2-Days 2 573 2-Days 2 551 2-Days 3 576 2-Days 3 540 2-Days 3 592 7-Days 1 1092 7-Days 1 1086 7-Days 1 1065 7-Days 2 1028 7-Days 2 1073 7-Days 2 998 7-Days 3 1066 7-Days 3 1045 7-Days 3 1055 ; SAS code proc glm data=yieldloads; class aging mix; model load = aging mix aging*mix; random mix aging*mix /test; run; OR USING: proc mixed data=yieldloads; class aging mix; model load= aging; random mix mix*aging; run; Source SAS output Type III Expected Mean Square Aging Var(Error) + 3 Var(Aging*Mix) + Q(Aging) Mix Var(Error) + 3 Var(Aging*Mix) + 6 Var(Mix) Aging*Mix Var(Error) + 3 Var(Aging*Mix) The GLM Procedure Tests of Hypotheses for Mixed Model Analysis of Variance Dependent Variable: Load Source DF Type III SS Mean Square F Value Pr > F Aging 1 1107072 1107072 1965.80 0.0005 Mix 2 2957.444444 1478.722222 2.63 0.2758 Error: MS(Aging*Mix) 2 1126.333333 563.166667 DF Type III SS Mean Square F Value Pr > F 2 1126.333333 563.166667 1.06 0.3774 12 6386.666667 532.222222 Source Aging*Mix Error: MS(Error) Question… • Option 1. Go back and complete SLICE part or • Option 2. Go ahead to the MANOVA • ? MANOVA example A researcher randomly assigns 33 subjects to one of three groups: G1 receives technical dietary information interactively from an on-line website. G2 receives the same information in from a nurse practitioner G3 receives the information from a video tape made by the same nurse practitioner The researcher looks at three different ratings of the presentation, difficulty, useful and importance, to determine if there is a difference in the modes of presentation. In particular, the researcher is interested in whether the interactive website is superior because that is the most cost-effective way of delivering the information. SAS code proc glm data=manovaex; class group; model useful difficulty importance = group; contrast '1 vs 2&3' group 2 -1 -1; contrast '2 vs 3' group 0 1 -1; manova h=_all_; run; Note: go to the manova.sas example