Chapter 9 Differences Among Groups Research Methods in Physical Activity Purpose and Protocol of the Statistical Test The purpose of the statistical test is to evaluate the null hypothesis at a specific level of probability (e.g., p < .05). In other words, do the two levels of treatment differ significantly (p < .05) so that these differences are not attributable to a chance occurrence more than 5 times in 100? The statistical test is always of the null hypothesis. All that statistics can do is reject or fail to reject the null hypothesis. Statistics cannot accept the research hypothesis. Only logical reasoning, good experimental design, and appropriate theorizing can do so. Statistics can determine only whether the groups are different, not why they are different. The t and the F ratios are used to determine whether groups are significantly different. In ANOVA (Analysis Of Variance)techniques, R2 is also used to establish meaningfulness. R2 is the percent variance in the dependent variable accounted for by the independent variable. (all discussed later in presentation) The meaningfulness of the differences is estimated by effect size (ES). Research Methods in Physical Activity Assumptions of t and F ratios The uses of the t and the F distributions have four assumptions (in addition to the assumptions for parametric statistics presented in chapter 6): ♦ Observations are drawn from normally distributed populations. ♦ Observations represent random samples from populations. ♦ The numerator and denominator are estimates of the same population variance. ♦ The numerator and denominator of F (or t) ratios are independent. Research Methods in Physical Activity When to use one vs. two tail tests for calculation of significance When designing the research model, the researcher must decide whether they will use a one or two tail test in determining the significance or the findings. IF you tested a research hypothesis that predicted not only that the sample mean would be different from the population mean but that it would be different in a specific direction — example: “it would be lower”. This test is called a directional or one-tailed test because the region of rejection is entirely within one tail of the distribution. Some hypotheses predict only that one value will be different from another, without additionally predicting which will be higher. The test of such a hypothesis is non-directional or two-tailed because an extreme test statistic in either tail of the distribution (positive or negative) will lead to the rejection of the null hypothesis of no difference. Research Methods in Physical Activity Types of t tests t Test between sample and population mean (see formula, Eq. 9.1, p. 148) The One-Sample t Test (also called one sample t Test) compares the mean score of a sample to a known value. The known value is a population mean. Hypotheses: Null: There is no significant difference between the sample mean and the population mean. Alternate: There is a significant difference between the sample mean and the population mean. (we will review problem, calculation, and results in text with example 9.1, p. 149) Research Methods in Physical Activity Types of t tests Independent t Tests (see formula, Eq. 9.3, p. 149) The Independent Samples t Test compares the mean scores of two groups on a given variable. These groups are separate and independent of each other, but are tested on the same variable. Hypotheses: Null: The means of the two groups are not significantly different. Alternate: The means of the two groups are significantly different. (you may refer 9.3, / 9.4 to reference the formulas, p. 149) Note the degrees of freedom for the independent t Test : df = n1 + n2 – 2 Thus there are two groups, and each group has df = n-1 Research Methods in Physical Activity Types of t tests Independent t Tests Meaningfulness As with other statistical tests, we would like to know the meaningfulness of the treatment effect. To estimate the degree to which the treatment influenced the outcome, use effect size (ES), the standardized difference between the means. ES = (M1 – M2)/s M1 = the mean of one group or level of treatment, M2 = the mean of a second group or level of treatment, and s = the standard deviation. * If there is no control group, then the pooled standard deviation (equation 9.7, p.150) should be used. Remember, effect size can be interpreted as follows: An ES of 0.8 or greater is large, an ES around 0.5 is moderate, and an ES of 0.2 or less is small. Research Methods in Physical Activity Homogeneity of Variance (basic assumption of parametric statistics) All techniques for comparisons between groups assume that the variances (standard deviation squared) between the groups are equivalent. Although mild violations of this assumption do not present major problems, serious violations are more likely if group sizes are not approximately equal. If this is the case, you will need to select formulas that will account for unequal numbers of subjects per group. Typically these are listed as test for unequal variances. Research Methods in Physical Activity Dependent t Tests The Dependent t Test is used when the two groups of scores are related in some manner. Usually, the relationship takes one of two forms: ♦ Two groups of participants are matched on one or more characteristics and thus are no longer independent. ♦ One group of participants is tested twice on the same variable, and the experimenter is interested in the change between the two tests. ( See Formula 9.8 for an example of the formula) Note; (see formula, 9.9, p, 151) The same participants are tested twice (pretest and posttest). Thus, we adjust the error term of the t test downward (make it smaller) by taking into account the relationship (r) between the pretest and posttest adjusted by their standard deviations. The degrees of freedom for the dependent t test are df = N – 1 where N = the number of paired observation. See example 9.3, p. 152, for calculating the dependent t with the raw score formula Research Methods in Physical Activity t tests and Power in Research (refer to formula 9.11, p.154) There are three ways to obtain power via the independent t Test. (The dependent t Test already has increased power because there is less within group variance [same subjects repeat measures] ) 1) The first level (M1 – M2) gives power if we can increase the difference between M1 and M2. This occurs if there is a greater treatment effect. If the value in the numerator becomes larger, then the t statistic becomes larger, and increases the likelihood of rejecting the Null. 2) The second level of the independent t formula is the variances (s2) for each group. If the variance is smaller, then the value in the denominator becomes smaller and thereby increases the t statistic, and increases the likelihood of rejecting the Null. 3) Finally, the third level (n1, n2) is the number of participants in each group. If n1 and n2 are increased and the first and second levels remain the same, the denominator becomes smaller (note n is divided into s2) and t becomes larger, thus increasing the odds of rejecting the null hypothesis and obtaining power. Research Methods in Physical Activity How the strength of the t statistic is evaluated After the null hypothesis is rejected, the strength (meaningfulness) of the effects must be evaluated. The t ratio has a numerator and a denominator. From a theoretical point of view, the numerator is regarded as true variance, or the real difference between the means. The denominator is considered error variance, or variation about the mean. Thus, t = true variance error variance If there are no differences between the variances, then t =1. Thus, when a significant t ratio is found, we are really saying that true variance exceeds error variance to a significant degree. The amount by which the t ratio must exceed 1.0 for significance depends on the number of participants (df) and the alpha level established. Research Methods in Physical Activity Relationships of t Test and Correlation Statistics There are two sources of variance: true variance and error variance (true variance + error variance = total variance). • The t test is the ratio of true variance to error variance, whereas r is the square root of the proportion of total variance accounted for by true variance. • To get t from r only means manipulating the variance components in a slightly different way. This is because all parametric correlational and differences-among-groups techniques are based on the general linear model. (see text pp, 157,158 for mathematical application of this concept) Research Methods in Physical Activity Classroom Examples on Using Excel 1) In MS Excel, you will need (If you have not done so already), to select the Office Button then select excel options, then select “add-ins”. 2) Choose Analysis Tool pack, click “go” and check it off the list, click “ok” During this class session we will create and process data for the following functions: 1) 2) 3) 4) 5) 6) Descriptive Statistics Percentiles Correlation Data Regression Data Independent t Test Dependent t Test Research Methods in Physical Activity ANOVA ( Analysis of Variance) ANOVA is an extension of the independent t test. In fact, t is just a special case of simple ANOVA in which there are two groups. Simple ANOVA allows the evaluation of the null hypothesis among two or more group means with the restriction that the groups represent levels of the same independent variable. See Table 9.1, p 159 for example. Notice that the explained variance is the SS within, and the unexplained variance is the SS within. Using the ANOVA in more than two groups prevents from committing a type 1 error, because each group (three groups) has been used in two comparisons (e.g., 1 vs. 2 and 1 vs. 3) rather than only one when the probability (alpha) has been established form comparisons between only two sets of scores. (FYI - Making this type of comparison, in which the same group’s mean is used more than once, is an example of increasing the experimentwise error rate) Research Methods in Physical Activity ANOVA ( Analysis of Variance) Calculating Simple ANOVA (see Table 9.2, p.160) Table 9.2 provides the formulas for calculating simple ANOVA and the F ratio. This method, the so-called ABC method, is simple: ♦A = ∑X2: Square each participant’s score, sum these squared scores (regardless of which group the participant is in), and set the total equal to A. ♦B = (∑X)2/N: Sum all participants’ scores (regardless of group), square the sum, divide by the total number of participants, and set the answer equal to B. ♦C = (∑X1)2/n1 + (∑X2)2/n2 + . . . + (∑Xk)2/nk. Sum all scores in group 1, square the sum, and divide by the number of participants in group 1; do the same for the scores in group 2, and so on for however many groups (k) there are. Then add all the group sums and set the answer to C. Note these formulas are just partitioning variance in different ways. Research Methods in Physical Activity ANOVA ( Analysis of Variance) see Table 9.2, p.160 Degrees of freedom is used to determined the Mean Squares (MS) for the between and with group variance. The F ratio is determined by dividing the MS between groups by the MS within group variance. ( MSB / MSW …. i.e., the ratio of true variance to error variance) Note: The F ratio is increased by either decreasing the within group variance and/or increasing the between group variance. To determine if “F” is significant you can refer to Table 6 , beginning on page 431. The degrees of freedom in the numerator is (k-1 : number of groups minus one), and the degrees of freedom in the denominator is ( N-k : total number of participants minus number of groups) Research Methods in Physical Activity ANOVA ( Analysis of Variance) Follow-up Testing With an ANOVA of three groups, if we find that significant differences exist among the three group means, we do not know whether all three groups differ. Thus, significant findings must be follow-up with a multiple comparison test. Your text explains the Scheffé technique (you may review the process, but you will not be examined on the protocol). Computer software will calculate significant findings between groups) You may determine meaningfulness by calculating Omega Squared (see example 9.13, p.163) Research Methods in Physical Activity Factorial ANOVA Factorial ANOVA — Analysis of variance in which there is more than one independent variable. See Table 9.3, p.165. Look at table 9.3 and note that the first independent variable (IV1) has two levels, labeled A1 and A2. In our example this IV represents the intensity of training: high intensity and low intensity. The second independent variable (IV2) represents the level of fitness of the participants: low fitness (B1) and high fitness (B2). There are two MAIN effects: Fitness and Intensity There may be interaction between the different levels of the MAIN effects. Research Methods in Physical Activity Factorial ANOVA See Table 9.3, p.165. (2 X 2 ANOVA) This particular factorial ANOVA is labeled a 2 (intensity of training) × 2 (level of fitness) ANOVA (read “2-by-2 ANOVA”). The true variance can be divided into three parts: ♦True variance because of A (intensity of training) ♦True variance because of B (level of fitness) ♦True variance because of the interaction of A and B Each of these true variance components is tested against (divided by) error variance to form the three F ratios for this ANOVA. Each of these Fs has its own set of degrees of freedom so that it can be checked for significance in the F table. First determine if there is a significant interaction, then look at the main effects. Research Methods in Physical Activity Factorial ANOVA (review of possible outcome scenarios) Figure 9.2 (2 X 2 ANOVA) … reflects a significant interaction, because the mean attitude scores of the low-fitness group toward the low-intensity training was higher than their attitude toward high-intensity training, whereas the opposite was shown for the high-fitness participants. They preferred the high-intensity program. This example shows how power is increased by using a particular type of statistical test. If we had just used a t test or simple ANOVA, we would not have found any difference in attitude toward the two levels of intensity (both means were identical, M = 25). When we added another factor (fitness level), however, we were able to discern that there were differences in attitude dependent on the level of fitness of the participants. Research Methods in Physical Activity Factorial ANOVA (review of possible outcome scenarios) Figure 9.3 (2 X 2 ANOVA) … shows a nonsignificant interaction. In this case, both groups preferred the same type of program over the other; hence, the lines are parallel. Significant interactions show deviations from parallel (as was shown in figure 9.2). The lines do not have to cross to reflect a significant interaction. Figure 9.4 (2 X 2 ANOVA) … shows a significant interaction in which the high-fitness group liked both forms of exercise equally, but there was a decided difference in preference in the low-fitness group, who preferred the low-intensity program over the high-intensity program Research Methods in Physical Activity Repeated Measures ANOVA Repeated-measures ANOVA — Analysis of scores for the same individuals on successive occasions, such as a series of test trials; also called split-plot ANOVA or subject × trials ANOVA. The most frequent use of repeated measures involves a factorial ANOVA in which one or more of the factors (independent variables) are repeated measures. Benefits of Repeated Measures ANOVA 1. Provides the experimenter the opportunity to control for individual differences among participants, probably the largest source of variation in most studies 2. The variation from individual differences can be identified and separated from the error term, thereby reducing it and increasing power. Because of the advantage of controlling individual differences, repeated-measures designs are more economical because fewer participants are required. Research Methods in Physical Activity Repeated Measures ANOVA Benefits of Repeated Measures ANOVA (continued) 3. Repeated-measures designs allow the study of a phenomenon across time. This feature is particularly important in studies of change in, for example, learning, fatigue, forgetting, performance, and aging Problems with Repeated Measures ANOVA 1. Carryover effects. Treatments given earlier influence treatments given later. 2. Practice effects. Participants improve at the task (dependent variables) as a result of repeated trials in addition to the treatment (also called the testing effect). 3. Fatigue. Participants’ performance is adversely influenced by fatigue (or boredom). 4. Sensitization. Participants’ awareness of the treatment is heightened because of repeated exposure. Example of Repeated Measures ANOVA is found on p. 170. Research Methods in Physical Activity End of Presentation Research Methods in Physical Activity