Psychological Bulletin 1991, Vol. 110, No. 3, 571-573 Copyright 1991 by the American Psychological Association~Inc. 0033-2909/91/$3.00 Misinterpretation of Interaction Effects: A Reply to Rosnow and Rosenthal D o n a l d L. M e y e r University o f Pittsburgh The structural model of analysis of variance for multiway tables, with main effects and interaction parameters, is overspeeified. Only the set of estimable functions of the cell means is useful. The individual estimates of the parameters are artificial as regards the underlying scientific process. Rosnow and Rosenthal (1989a) stated that it is "absolutely necessary" (pg 146) that interactions should be interpreted by examining the usual estimates of the interaction parameters. This is incorrect. effects, two column effects, and four interaction effects. That gives a total o f nine parameters (effects). It is at once obvious that four observed cell means cannot estimate nine effects! How then is progress naade? The answer is that certain functions o f the parameters can be estimated. In particular, the difference o f the differences o f the cell means in row 1 and the means in row 2 estimates the corresponding difference o f the differences o f the 7(~j): Rosnow and Rosenthal (1989a) stated that it is an error to report cell means to interpret the pattern o f an observed interaction. Because I suspect that several people will be concerned with this statement, especially when Rosnow and Rosenthal (1989b) stated that interaction effects are "the universally most misinterpreted empirical results in psychology" (p. 1282), I felt this reply was in order. As an aside, I find it fascinating that a four-page article was published explaining how to interpret four numbers---more than 50 years after R. A. Fisher (1935) explained the analysis o f variance (ANOVA). Reviewers o f a previous version o f this article called my attention to a number o f articles also discussing interaction effects in the ANOVA: Marascuilo and Levin (1970, 1976), Levin and Marascuilo (1972, 1973), Games (1973), and Boik (1979). The same lack o f sensitivity to the principal point o f the present article is evident in some of these articles. Rosnow and Rosenthal (1989a) present an "algorithm" for obtaining two-way interaction effects "corrected" for main effects. They are seemingly unaware that their algorithm is nothing more than the solution for the estimates o f the interaction parameters in the usual linear model for a two-way table o f means: tz(ij) = tz + a ( i ) + B(I) + r(i3), [re(l, 2) - re(l, 1)1 - [m(2, 2) - m(2, 1)1. This is the familiar interaction contrast shown in statistics textbooks. For concreteness, consider the example used by Rosnow and Rosenthal (1989a) o f an investigator studying the despair of bereavement o f family members when a child dies. Grief intensity is the dependent variable; health and sex o f child are the two "factors" At the top of Table 1 is a hypothetical set o f true population means consistent with Rosnow and Rosenthal's (1989a) ranking o f the cells: Healthy male > healthy female > unhealthy female > unhealthy male. Next is presented the "scientific" effects, or what Levin and Marascuilo (1973) refer to as the "latent structure o f the variable" (p. 308). This structure specifies that the true means arise from combinations o f additive row, column, and interaction effects, as in Model 1. These "true" effects show that being healthy adds 3 points to bereavement, and being unhealthy adds 10 points. Girls receive 5 points; boys receive 0. The numbers in the cells are the interaction effects when sex is mixed with health status. Healthy boys have a true mean o f 17 because they are in a cell receiving 3 for healthy, 0 for boys, 12 for the interaction, and 2 for the grand mean. Similarly, the other three true means result as shown. These effects were simply made up and are just one o f the infinite number o f sets o f possible effects consistent with the true means. Samples from the four groups are taken, and suppose our observed cell means are the same as the true means: a successful research stud~. The bottom o f Table 1 shows the usual estimates using Model 1. These are calculated using constraints on the e s t i m a t e s o f the parameters. I emphasize this to show that the statistical analysis is arbitrary as regards the actual values o f the parameters. The usual constraints specify that the row and column estimates both sum to zero and the interaction estimates (1) in which t*(~]) is the mean in row i and column j, t~ is the grand mean, a(i) is the main effect for row i, B(J) is the main effect for column j, and r(ij) is the interaction effect for the cell in row i and column j. The estimates o f the r(/j) are well-known to be rn(ij) - re(i) - r e ( j ) + m , in which all the ms are observed means. The principal error o f Rosnow and Rosenthal (1989a) is that they have confused the estimation o f the statistical model with scientific reality. Stated another way, even if the cell means in a two-way table result from additive effects as in Model 1, the parameters of that model are unknowable. The model is overspecified, and the parameters are said to be nonestimable. Think of a 2 × 2 table: There is one grand mean, two row I thank the reviewers for their many helpful comments. Correspondence concerning this article should be addressed to Donald L. Meyer, Department o f Psychology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260. 571 572 DONALD L. MEYER Table 1 Cell Means, Effects, and Estimates Health status Girl Boy By row 14 6 17 5 15.5 5.5 10 11 Population M Healthy Unhealthy M Grand M 10.5 Scientific effect Healthy Unhealthy Column effect 4 - 11 12 -7 3.0 10.0 5 0 2.0 Estimate Healthy Unhealthy Column estimate - 1 1 -0.5 1 - 1 0.5 5 -5 10.5 sum to zero in any row or column. This results in one "free" row parameter, one free column parameter, and one free interaction parameter. With the grand mean, a 2 × 2 table with 4 data points has the usual four degrees of freedom. One should notice that the familiar"l, - r' pattern of interaction estimates results. For Cell 12,17 - 15.5 - l 1 + l 0.5 = l or, as Rosnow and Rosenthal (1989a) show: 17 - (5 + 0.5 + 10.5). Note that the estimates o f the interaction parameters have no resemblance to the "true" interaction effects! However, the difference (of the differences o f column 2 and column l) for row 1 and row 2 is 4 points for all three sections o f Table 1: (l 7 - 14) - (5 - 6) = 02 - 4) - (-7 + 11) = (1 - l) - ( - l + 1) = 4. In the technical jargon, this interaction contrast is an estimable function. Rosnow and Rosenthal's (1989a) example concerned a researcher who predicted a certain ranking o f group means and who collected data that didn't verify the prediction (in relation to the ranking o f the means). Rosnow and Rosenthal obtained the 1, - 1 pattern for both the researcher's prediction and for the observed data (Rosnow & Rosenthal, 1989a, Table 1). They stated, ' ~ l s o seemingly unbeknown to the investigator, we see that he found exactly the pattern o f interaction that was predicted" (Rosnow & Rosenthal, 1989a, p. 144). Rosnow and Rosenthal made the interaction contrast to be 4 in both cases. The 1, - 1 pattern must result but does not reflect the true interaction parameters or the ranking o f the true means. Rosnow and Rosenthal's (1989a) criticism was that the researcher reported the obtained cell means and said they didn't conform to his prediction. I f a person predicts a certain ranking o f means, there is no obligation on that person to specify implied main effects or interactions. The only obligation is honesty, for which this researcher should be applauded. If one wants to discuss interaction, then noting that the sex effect is 3 points for healthy and - 1 for unhealthy is sufficient for point estimation. The difference o f these "simple" effects is the interaction contrast of 4. For confidence intervals one would also calculate the variance o f the interaction contrast and, if desired, the variance o f these simple contrasts. The example also shows that even the difference and order of row and column effects are misestimated! The true scientific effects show a 7-point effect favoring unhealthy, but the ANOVA model shows a difference o f 10 points favoring healthy. When Rosnow and Rosenthal "peel" away main effects, they use unreal main effects. It is a well-known fact that when interaction occurs, main effects are not readily interpretable. Rosnow and Rosenthal give a second example in which they calculate meaningless F tests for main effects in the presence o f a significant interaction. At least they are in good company, because this is seen repeatedly both in articles and textbooks! This second example shows how Rosnow and Rosenthal's advice can confuse even themselves. The example concerned inexperienced (I) and experienced (E) ball players assigned to either a control treatment or a "Ralphing" treatment. Both groups scored 3 in the control condition, and I went up to 5 and E went up to 7 in the Ralphing condition. Both groups benefited from Ralphing, but the experienced group benefited even more. Rosnow and Rosenthal plotted the interaction estimates and obtained the +, - crossing pattern. They then said, "The experienced ball players benefited moderately from Ralphing to the same degree that inexperienced ball players were harmed by it" (Rosnow & Rosenthal, 1989a, p. 146). The theoretical cell means are estimable by the observed cell means (or more generally, by linear functions o f the means). In plain words, they reflect reality. When interaction occurs, they are the best guesses (non-Bayesian) o f future behavior for the various factor combinations defining the cells. Furthermore, in the presence of interaction, the structural model with main effects is o f no use. Rosnow and Rosenthal (1989a) state, "The point o f this article is to emphasize that if investigators are claiming to speak o f an interaction, the exercise o f looking at the corrected cell (or condition) means is absolutely necessary" (p. 146). This is misguided. The usual graph of cell means shows how the two factors are behaving together. The interaction is seen in the lack o f parallelism o f the two lines. Rosnow and Rosenthal would have us always plot the lines as crossing! Marascuilo and Levin (1970) stated that many o f the errors in analysis after rejection o f a test for interaction come from "an incorrect understanding that researchers have concerning interaction in a factorial design" (p. 414). They called this the "intuitive m o d e l ; or "synergistic model7 Their example concerns four groups: placebo, Drug A, Drug B, both Drug A and B. Their model specifies the following: Group Effect placebo M Drug A M + A Drug B M + B DrugA, B M+A+B+(AB) Marascuilo and Levin (1970) identify the (AB) parameter as the interaction resulting from the synergistic effect of the two drugs administered together. They go on to say that in this model it makes good sense to think o f interaction in a single cell, but in 573 MISINTERPRETATION OF INTERACTION EFFECTS the generally applied ANOVA model, an interaction is "a component that involves every cell o f the design and not just the cell representing the joint administration o f the two drugs" (2 × 2 designs; Marascuilo & Levin, 1970, p. 416). Table 2 shows the estimates using the same true means as in the top o f Table 1. Lo and behold! The synergistic interaction parameter is estimated to be 4, which is the value o f the interaction contrast o f the cell means calculated previously. Again the main effects o f drugs, A = - 1 and B = 8, bear no resemblance to either the usual estimates or the "latent" effects o f Table 1. This "intuitive" model is just another way o f parameterizing four cell means. Whether the true science is synergistic or follows the latent structural model is unknowable from four data points (or more in a larger two-way table)! Much o f the discussion is valuable because Marascuilo and Levin (I 970) emphasize interaction as the difference o f differences and they address the problem of multiple-error rates. However, they seemed to have not followed their own dictum that interaction involves every cell when they made a rather large issue o f calculating confidence intervals for single interaction parameters. Games made almost the same criticism when he pointed out that the interaction parameters "have no generalizability to a modified replication" (Games, 1973, p. 305). As I have shown, they do not even have utility for a current data set, except as they enter estimable functions. Most presentations o f the structural model (with more parameters than data points) in statistics textbooks for the behavioral sciences are incomplete. This is probably because o f the desire to avoid rather heavy mathematics. However, it is unfortunate that the distinction between the original (scientific) model and the reparameterized model is blurred. The ANOVA is an analysis o f estimable functions, not o f individual parameters. Two important points are (a) any reparameterization leads to the same estimate of an estimable function, and (b) it is possible to test hypotheses only about estimable functions (Kempthorne, 1952). The failure to make the distinction has misled many, the most recent being Rosnow and Rosenthal (1989a). Once one thinks in terms o f cell means or the cell means model as Marascuilo and Levin did later (Marascuilo & Levin, 1976), most of Table 2 Estimates Using Synergistic Parameterization Drug A Drug B Yes No No M+B=6+8 M=6 the issues disappear, because contrasts o f the cell means are estimable functions. In a 2 × 2 table, four means can be compared in 11 different ways using unit weights. Only one o f those ways directly addresses interaction: the difference o f the differences. If a researcher wants to report other comparisons like "simple effects" and if those comparisons are deemed useful in interpretation, then by all means they should be reported. I believe Games (1973) subscribed to that view. If one wants to analyze various interaction contrasts in larger tables, refer to textbooks such as Kirk (1982) and the article by Boik (1979). The only real issue is not how to interpret interactions, but how one controls significance levels in multiple-testing situations. The structural model with only main effects is still overparameterized but seemingly causes no difficulties in interpretation. To all those who make graphs o f cell means to show the interaction as nonparallel fines, ignore the Rosnow and Rosenthal articles and keep on plotting! References Boik, R. J. (1979). Interactions, partial interactions and interaction contrasts in the analysis of variance. Psychological Bulletin, 86, 1084--1089. Fisher, R. A. (1935). The design of experiments. Edinburgh, Scotland: Oliver & Boyd. Games, P. A. (1973). Type IV errors revisited. Psychological Bulletin, 80, 304-307. Kempthorne, O. (1952). The design and analysis of experiments. New York: Wiley. Kirk, R. E. (1982). Experimental design: Procedures for the behavioral sciences (2nd ed.). Monterey, CA: Brooks/Cole. Levin, J. R., & Marascuilo, L. A. (1972). Type IV errors and interaction. Psychological Bulletin, 78, 368-374. Levin, J. R., & Marascuilo, L. A. (1973). Type IV errors and games. Psychological Bulletin, 80, 308-309. Marascuilo, L. A., & Levin, J. R. (1970). Appropriatepost hoc comparisons for interaction and nested hypotheses in analysis of variance designs: The elimination of Type IV errors. American Educational Research Journal 7, 397-421. Marascuilo, L. A., & Levin, J. R. (1976). The simultaneous investigation of interaction and nested hypotheses in two-factor analysis of variance designs. American Educational Research Journal, 13, 6165. Rosnow, R. L., & Rosenthal, R. (1989a). Definition and interpretation of interaction effects. Psychological Bulletin, 105, 143-146. Rosnow, R. L., & Rosenthal, R. (1989b). Statistical procedures and the justification of knowledge in psychological science. American Psychologist, 44, 1276-1284. Yes M+A+B+(AB)=6-1 M+A=6-1 +8+4 Received November 29, 1989 Revision received October 4, 1990 Accepted December 24, 1990 •