SURVEILLANCE IS NOT THE ANSWER Running head: SURVEILLANCE IS NOT THE ANSWER Surveillance is not the answer, and replication is not a test: Comment on Kepes and McDaniel, 'How trustworthy is the scientific literature in I-O psychology?' Maarten Derksen Eric F. Rietzschel University of Groningen, Department of Psychology 1 SURVEILLANCE IS NOT THE ANSWER 2 Surveillance is not the answer, and replication is not a test: Comment on Kepes and McDaniel, 'How trustworthy is the scientific literature in I-O psychology?' Although we share Kepes and McDaniel's concern about the state of affairs in I-O psychology, we think their emphasis on control and correction will, in the end, be counterproductive. Specifically, we argue that questionable research practices can best be remedied by encouraging an open academic culture, characterized by error management, rather than a culture of distrust, aimed at error prevention. An example is the call for more replication studies, which should not be framed as ‘effect size police,’ but as normal research. The mechanisms that Kepes and McDaniel recommend largely come down to mutual surveillance within the research community, enforced and facilitated by journals. While a lack of functional control mechanisms probably contributes to the occurrence of fraud, the spectrum of questionable research practices is very broad (e.g., Neuroskeptic, 2012), encompassing not only fraud, but a whole array of flawed practices that differ in perniciousness. Although there are, obviously, things a researcher should never do, many other practices are not necessarily questionable because of what the researcher does (e.g., removing outliers, for which good reasons may exist), but because this information is not properly disclosed. Similarly, discussions about publication bias and the lack of available replication work revolve around a lack of disclosure (mostly of nonsignificant findings) (e.g., Francis, 2012; Galak & Meyvis, 2012). Error Management As others have noted (e.g., Fanelli, 2013; Nosek, Spies, & Motyl, 2012; Simmons, Nelson, Simonsohn, 2012; Wicherts & Bakker, 2012), what is required is a more open research culture in which researchers are willing to share data and to fully disclose their analytical strategies. Simply instating more control mechanisms may not be the best way to SURVEILLANCE IS NOT THE ANSWER 3 achieve this, because the sole focus then is on the prevention of fraud and errors, rather than on dealing with them in a constructive manner when they do occur. Of course we agree with Kepes and McDaniel that data sharing and replication research are essential for a reliable and practical science of psychology. What we question is the increasing tendency to see these practices as means of control and methodological detective work. We need a research culture where researchers actually want to engage in discussion about their methods and about the robustness of their results, and where researchers actually want to share their data, with all of its quirks and shortcomings. This requires an approach that is akin to error management (e.g., Frese et al., 1991). The essence of error management is that it is accepted that errors are, to a certain extent, inevitable, and can serve an important learning function. Research has shown that error management training contributes to effective learning (Keith & Frese, 2008); more importantly, Van Dyck, Frese, Baer, and Sonnentag (2005) found that an error management culture actually contributed to organizational performance. One reason for this, they argue, is that open communication about errors increases the probability of their being detected in a relatively early stage, when their potential negative consequences are relatively limited. We would hypothesize that the same holds for organizational psychology departments. Replication Research Recommendations for improvements in psychological research practices should, at least in part, be judged by their contribution to, or compatibility with, a work climate where researchers feel safe to discuss all of their ‘sloppy’ methods and their ‘messy’ data, rather than feeling compelled to produce unrealistically neat and seemingly flawless stories (GinerSorolla, 2012). We agree with Kepes and McDaniel that one such improvement should be to give exact replications a prominent role in psychological research. However, promoting exact replications as a mechanism to weed out bias and error will backfire. Replication studies will SURVEILLANCE IS NOT THE ANSWER 4 never become a popular avenue for research if they are framed as tests of whether an effect is ‘true’ or not—or, worse, if non-replications are taken to mean that “something is wrong” (Bartlett, 2013). We agree with Koole and Lakens (2012) that: “[t]he negative perception of replications may carry over to researchers who are engaged in replication research, as they may be perceived as hostile toward the researchers who conducted the original research” (p. 610). Should we engage in replication research to track down other researchers’ erroneous findings? Or are replication studies important in its own right, simply because—whatever the result— they teach us something about the conditions under which an effect is likely to occur? Moreover, exact replications simply do not work as 'effect size police'. Kepes and McDaniel adopt Karl Popper's view that observations are only scientific if they can be reproduced according to rules, and that we may only trust results if anyone who follows the exact same procedures as described in the method section of an experimental report can reproduce the observation that was reported. Thus, exact replications are the gold standard of science, “essential for the ability of a scientific field to self-correct” (Kepes & McDaniel, pp.18-19). Collins (1985) has pointed out a weakness in the Popperian argument: Unless there is consensus on what the truth of the matter is (i.e., what the result of the experiment should be), there is no way to objectively gauge whether the experiment was a competent and sufficiently exact replication. After all, two experiments are never exactly identical1 and researchers can always point to actual or possible differences between original and replication to explain away the failure to reproduce their results. The recent controversy over Doyen et al.'s non-replication of Bargh, Chen and Burrow's 'elderly walking study' (Bargh et al., 1996; Doyen et al., 2012) is a case in point: Bargh could easily point out differences (including Doyen et al.'s alleged 'incompetence') that might explain the non-replication. In the absence of an independent criterion to judge whether one experiment is a ‘competent’ replication of another, such controversies can drag on for years, each side believing the other 'must be SURVEILLANCE IS NOT THE ANSWER 5 doing something wrong' to get such anomalous results. Thus, an exact replication cannot “determine whether an observed effect is ‘true’” (Kepes & McDaniel, p.18)2, precisely because its status of exact replication is disputable as long as there is no consensus about the correct result. This will only be made worse if replications are assigned the role of error detection mechanisms. Thus, although error correction may of course be a fortunate consequence of some replication studies, a non-replication should, in our view, not be framed as a falsification of an earlier result, but as an invitation to further explore the differences between the original experiment and its replication. A classic model for such exploration is the joint design of crucial experiments by Latham et al. (1988) (also see the ‘adversarial collaboration’ of Mellers et al., 2001, and Koole & Lakens, 2012, for a recent endorsement of this approach). The real value of exact replications (non-replications in particular) lies in the fact that they draw attention to what Kepes and McDaniel call (after LeBel & Peters, 2011) “methodrelevant beliefs”, or to be precise: beliefs that are at once theory-relevant and method-relevant, because they concern the manipulations that should theoretically produce certain effects and the way these effects are measured. Rather than consigning a failed exact replication to the file-drawer, as usually happens, or hailing it as a falsification, as Kepes and McDaniel propose, it should be treated as an interesting opportunity for the further development of a theory, particularly the operationalization of its key variables. This strikes us as particularly important in I-O psychology, in view of its proximity to the applied context. To be relevant for organizations, I-O psychology needs theories that do more than enable post-hoc explanations, but furnish effective interventions and precise measurement instruments. Replication studies are essential to the practical value of our theories, because they force us to attend to the reliability and precision of our experimental manipulations and measurements. In sum, we agree with Kepes and McDaniel that we need more confidence in our SURVEILLANCE IS NOT THE ANSWER effects and effect sizes, but the best way to raise this confidence is not to create a culture of surveillance and error avoidance, but to cultivate a work climate in which error management happens through the open discussion of research practices. In such a culture, 'failed' replications do not mean that ‘something is wrong,’ but rather that ‘something interesting is going on.’ 6 SURVEILLANCE IS NOT THE ANSWER Notes 1 Which is why some people prefer the term 'direct replication' (Schmidt, 2009) or 'close replication' (LeBel & Peters, 2011). We follow Kepes & McDaniel's usage here. 2 See also LeBel & Peters (2011, p.376) 7 SURVEILLANCE IS NOT THE ANSWER 8 References Bargh, J., Chen, M., & Burrows, L. (1996). Automaticity of social behavior: Direct effects of trait construct and stereotype activation on action. Journal of Personality and Social Psychology, 71, 230–244. Bartlett, T. (2013). Power of suggestion. The Chronicle of Higher Education. Retrieved on March 04, 2013, from http://chronicle.com/article/Power-of-Suggestion/136907 Collins, H. M. (1985). Changing order: replication and induction in scientific practice. London etc.: SAGE Publications. Doyen, S., Klein, O., Pichon, C.-L., & Cleeremans, A. (2012). Behavioral priming: It’s all in the mind, but whose mind? PLoS ONE, 7(1), e29081. doi: 10.1371/journal.pone.0029081 Fanelli, D. (2013). Redefine misconduct as distorted reporting. Nature, 494(7436), 149–149. doi: 10.1038/494149a Francis, G. (2012). The psychology of replication and replication in psychology. Perspectives on Psychological Science, 7, 585–594. doi: 10.1177/1745691612459520 Frese, M., Brodbeck, F. C., Heinbokel, T., Mooser, C., Schleiffenbaum, E., & Thiemann, P. (1991). Errors in training computer skills: On the positive function of errors. Human– Computer Interaction, 6, 77–93. Galak, J., & Meyvis, T. (2012). You could have just asked: Reply to Francis (2012). Perspectives on Psychological Science, 7, 595-596. doi: 10.1177/1745691612463079 Giner-Sorolla, R. (2012). Science or art? How aesthetic standards grease the way through the publication bottleneck but undermine science. Perspectives on Psychological Science, 7, 562-571. doi: 10.1177/1745691612457576 Keith, N., & Frese, M. (2008). Effectiveness of error management training: A meta-analysis. Journal Of Applied Psychology, 93, 59-69. doi: 10.1037/0021-9010.93.1.59 SURVEILLANCE IS NOT THE ANSWER 9 Kepes, S., & McDaniel, M. A. (2012). How trustworthy is the scientific literature in I-O psychology? Industrial and Organizational Psychology: Perspectives on Science and Practice. Koole, S. L., & Lakens, D. (2012). Rewarding replications: A sure and simple way to improve psychological science. Perspectives on Psychological Science, 7, 608–614. doi: 10.1177/1745691612462586 Latham, G. P., Erez, M., & Locke, E. A. (1988). Resolving scientific disputes by the joint design of crucial experiments by the antagonists: Application to the Erez–Latham dispute regarding participation in goal setting. Journal of Applied Psychology, 73, 753–772. doi: 10.1037/0021-9010.73.4.753 LeBel, E. P., & Peters, K. R. (2011). Fearing the future of empirical psychology: Bem’s (2011) evidence of psi as a case study of deficiencies in modal research practice. Review of General Psychology, 15, 371–379. doi: 10.1037/a0025172 Mellers, B., Hertwig, R., & Kahneman, D. (2001). Do frequency representations eliminate conjunction effects? An Exercise in Adversarial Collaboration. Psychological Science, 12, 269–275. doi:10.1111/1467-9280.00350 Neuroskeptic (2012). The nine circles of scientific hell. Perspectives on Psychological Science, 7, 643-644. doi: 10.1177/1745691612459519 Nosek, B. A., Spies, J. R., & Motyl, M. (2012). Scientific utopia: II. Restructuring incentives and practices to promote truth over publishability. Perspectives in Psychological Science, 7, 615–631. doi: 10.1177/1745691612459058 Schmidt, S. (2009). Shall we really do it again? The powerful concept of replication is neglected in the social sciences. Review of General Psychology, 13, 90–100. doi: 10.1037/a0015108 Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2012). A 21 Word Solution. SSRN eLibrary. SURVEILLANCE IS NOT THE ANSWER 10 Retrieved from http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2160588 Simonsohn, U. (2012). It does not follow: Evaluating the one-off publication bias critiques by Francis (2012a, 2012b, 2012c, 2012d, 2012e, in press). Perspectives on Psychological Science, 7, 597-599. doi: 10.1177/1745691612463399 Van Dyck, C., Frese, M., Baer,M., & & Sonnentag, S. (2005). Organizational error management culture and its impact on performance: A two-study replication. Journal of Applied Psychology, 90, 128-1240. Wicherts, J. M., & Bakker, M. (2012). Publish (your data) or (let the data) perish! Why not publish your data too?. Intelligence, 40, 73-76.