EDF 802 Dr. Jeffrey Oescher Formative Exercise Topic 4 - Factorial ANOVA Sample Responses Descriptive Statistics Very little data was given other than the gender and treatment received by each student. Eight subjects participated in each of the three groups. Five males and three females were in the morning and afternoon groups; five females and three males were in the noon group. Table 1 describes the performance of the groups on the test. Scores for females were slightly higher than those for males; both groups answered slightly more than three-fourths of the items correctly. Those students in the morning group performed somewhat better than those in either the noon or afternoon groups. On average the morning group answered about 90% of the items correctly while the noon and afternoon groups answered approximately 70% of the items correctly. Variation across the scores of all groups was moderate. Please note I have chosen to discuss only the main effects. That is, I discussed differences between males and females first followed by a discussion of morning, noon, and afternoon. I interpreted the scores from both norm and criterion referenced perspectives. Table 1 Test Score Statistics by Sex and Time Sex 1 2 Total Time 1 2 3 Total 1 2 3 Total 1 2 3 Total N 3 5 3 11 5 3 5 13 8 8 8 24 Mean 44.00 32.40 38.67 37.27 44.80 37.33 35.60 38.54 44.50 34.25 36.75 38.50 SD 4.58 5.73 4.16 6.84 4.32 3.21 4.16 5.74 4.11 5.31 4.17 6.23 Inferential Analyses The hypotheses being tested in this analysis reflect a main effect for the time-of-day (Hypothesis 1), a main effect for gender (Hypothesis 2), and an interaction effect for time-of-day by gender (Hypothesis 3). 1. 2. 3. H0:µ1.= µ2. = μ3. H1:µi.≠ µj. H0:µ.1= µ.2 H1:µ.1≠ µ.2 H0:all αβ effects = 0 H1:all αβ effects ≠ 0 Alpha level was set at 0.05. An a-priori power estimate to determine sample size was conducted using Cohen’s table. I assumed a moderate effect size, an alpha level of .05, and power of 0.80. For the three levels of time-of-day, 52 subjects were needed for each time. Spread across the two levels of gender, 26 -1- males and 26 females were needed for each cell for each of the three times. For the two levels of gender, 64 males and 64 females were needed. These needed to be spread across the three time levels, so the recommended sample was increased to 66. Thus, 22 subjects were needed for each of the three times. The larger of these two cell sizes is 26, so it was chaosen. Across all six cells, the recommended sample size is thus 156. The actual sample size is far smaller than this and could possibly affect the results in terms of an increased likelihood of a Type II error (i.e., lower levels of power). The sampling distributions for each hypothesis are F2,18 for time, F1,18 for sex, and F2,18 for the interaction of time by sex. All inferential tests result in F-statistics. The results of a factorial ANOVA are presented in Table 2. The assumptions underlying this analysis were all met. The homogeneity of variance assumption was tested with Levene’s Statistic and found nonsignificant (F5,18 = 0.51, p=.764). The procedure is robust with respect to the violation of the assumption of normality, and the assumption of independence of observations was assumed true. Table 2 Factorial ANOVA Results Source SS Time-of-day 372.07 Gender 4.44 Time-of-day*Gender 60.02 Error 372.53 df 2 1 2 18 MS 186.03 4.44 30.01 20.67 F 8.99 0.22 1.45 Sig 002 .649 ..261 An examination of the information in Table 2 indicates a significant effect for time-of-day (F2,18 = 8.99, p=.002) and non-significant results for the gender (F1,18 = 0.22, p=.649) or interaction (F2,18 = 1.45, p=.261) effects. Sheffee post hoc analyses were used to identify which of the three pairs of means for the time-ofday effect were statistically different. The results indicated morning was statistically different from either noon or afternoon (MD = 10.25, p = .001; MD = 7.75, p = .011 respectively). There was no statistical difference between noon and afternoon (MD = -2.50, p = .557). Therefore, the null hypotheses for the comparisons of morning to noon as well as morning to afternoon were rejected; the null hypothesis for the comparison of noon to afternoon was accepted. An examination of the mean scores indicated students performed statistically better in the morning than in at noon or in the afternoon. This is likely due to the physical and psychological factors discussed earlier in this report. For a conceptual perspective, the analysis of the data began with the assumptions the null hypotheses were true. This permitted the researcher to generate three sampling distributions of F, one for each of the three hypotheses. One the actual observed F-statistics were calculated, they were mapped into the respective sampling distribution. In the case of the time-of -day effect, the observed F-statistic was atypical of those in the sampling distribution. It is reasonable to suggest the null hypothesis of equal means across the three levels of time-of-day is false; there are differences. In the cases of the gender and interaction effects, the observed statistics were typical of those in the respective sampling distributions. It is reasonable to suggest the null hypotheses for the two levels of gender and the six levels of interaction are true. -2-