Surveillance is not the answer, and replication is not a test

advertisement
SURVEILLANCE IS NOT THE ANSWER
Running head: SURVEILLANCE IS NOT THE ANSWER
Surveillance is not the answer, and replication is not a test:
Comment on Kepes and McDaniel, 'How trustworthy is the scientific literature in I-O
psychology?'
Maarten Derksen
Eric F. Rietzschel
University of Groningen, Department of Psychology
1
SURVEILLANCE IS NOT THE ANSWER
2
Surveillance is not the answer, and replication is not a test:
Comment on Kepes and McDaniel, 'How trustworthy is the scientific literature in I-O
psychology?'
Although we share Kepes and McDaniel's concern about the state of affairs in I-O
psychology, we think their emphasis on control and correction will, in the end, be
counterproductive. Specifically, we argue that questionable research practices can best be
remedied by encouraging an open academic culture, characterized by error management,
rather than a culture of distrust, aimed at error prevention. An example is the call for more
replication studies, which should not be framed as ‘effect size police,’ but as normal research.
The mechanisms that Kepes and McDaniel recommend largely come down to mutual
surveillance within the research community, enforced and facilitated by journals. While a
lack of functional control mechanisms probably contributes to the occurrence of fraud, the
spectrum of questionable research practices is very broad (e.g., Neuroskeptic, 2012),
encompassing not only fraud, but a whole array of flawed practices that differ in
perniciousness. Although there are, obviously, things a researcher should never do, many
other practices are not necessarily questionable because of what the researcher does (e.g.,
removing outliers, for which good reasons may exist), but because this information is not
properly disclosed. Similarly, discussions about publication bias and the lack of available
replication work revolve around a lack of disclosure (mostly of nonsignificant findings) (e.g.,
Francis, 2012; Galak & Meyvis, 2012).
Error Management
As others have noted (e.g., Fanelli, 2013; Nosek, Spies, & Motyl, 2012; Simmons,
Nelson, Simonsohn, 2012; Wicherts & Bakker, 2012), what is required is a more open
research culture in which researchers are willing to share data and to fully disclose their
analytical strategies. Simply instating more control mechanisms may not be the best way to
SURVEILLANCE IS NOT THE ANSWER
3
achieve this, because the sole focus then is on the prevention of fraud and errors, rather than
on dealing with them in a constructive manner when they do occur. Of course we agree with
Kepes and McDaniel that data sharing and replication research are essential for a reliable and
practical science of psychology. What we question is the increasing tendency to see these
practices as means of control and methodological detective work.
We need a research culture where researchers actually want to engage in discussion
about their methods and about the robustness of their results, and where researchers actually
want to share their data, with all of its quirks and shortcomings. This requires an approach
that is akin to error management (e.g., Frese et al., 1991). The essence of error management is
that it is accepted that errors are, to a certain extent, inevitable, and can serve an important
learning function. Research has shown that error management training contributes to effective
learning (Keith & Frese, 2008); more importantly, Van Dyck, Frese, Baer, and Sonnentag
(2005) found that an error management culture actually contributed to organizational
performance. One reason for this, they argue, is that open communication about errors
increases the probability of their being detected in a relatively early stage, when their
potential negative consequences are relatively limited. We would hypothesize that the same
holds for organizational psychology departments.
Replication Research
Recommendations for improvements in psychological research practices should, at
least in part, be judged by their contribution to, or compatibility with, a work climate where
researchers feel safe to discuss all of their ‘sloppy’ methods and their ‘messy’ data, rather
than feeling compelled to produce unrealistically neat and seemingly flawless stories (GinerSorolla, 2012). We agree with Kepes and McDaniel that one such improvement should be to
give exact replications a prominent role in psychological research. However, promoting exact
replications as a mechanism to weed out bias and error will backfire. Replication studies will
SURVEILLANCE IS NOT THE ANSWER
4
never become a popular avenue for research if they are framed as tests of whether an effect is
‘true’ or not—or, worse, if non-replications are taken to mean that “something is wrong”
(Bartlett, 2013). We agree with Koole and Lakens (2012) that: “[t]he negative perception of
replications may carry over to researchers who are engaged in replication research, as they
may be perceived as hostile toward the researchers who conducted the original research” (p.
610). Should we engage in replication research to track down other researchers’ erroneous
findings? Or are replication studies important in its own right, simply because—whatever the
result— they teach us something about the conditions under which an effect is likely to occur?
Moreover, exact replications simply do not work as 'effect size police'. Kepes and
McDaniel adopt Karl Popper's view that observations are only scientific if they can be
reproduced according to rules, and that we may only trust results if anyone who follows the
exact same procedures as described in the method section of an experimental report can
reproduce the observation that was reported. Thus, exact replications are the gold standard of
science, “essential for the ability of a scientific field to self-correct” (Kepes & McDaniel,
pp.18-19). Collins (1985) has pointed out a weakness in the Popperian argument: Unless
there is consensus on what the truth of the matter is (i.e., what the result of the experiment
should be), there is no way to objectively gauge whether the experiment was a competent and
sufficiently exact replication. After all, two experiments are never exactly identical1 and
researchers can always point to actual or possible differences between original and replication
to explain away the failure to reproduce their results. The recent controversy over Doyen et
al.'s non-replication of Bargh, Chen and Burrow's 'elderly walking study' (Bargh et al., 1996;
Doyen et al., 2012) is a case in point: Bargh could easily point out differences (including
Doyen et al.'s alleged 'incompetence') that might explain the non-replication. In the absence
of an independent criterion to judge whether one experiment is a ‘competent’ replication of
another, such controversies can drag on for years, each side believing the other 'must be
SURVEILLANCE IS NOT THE ANSWER
5
doing something wrong' to get such anomalous results. Thus, an exact replication cannot
“determine whether an observed effect is ‘true’” (Kepes & McDaniel, p.18)2, precisely
because its status of exact replication is disputable as long as there is no consensus about the
correct result. This will only be made worse if replications are assigned the role of error
detection mechanisms.
Thus, although error correction may of course be a fortunate consequence of some
replication studies, a non-replication should, in our view, not be framed as a falsification of
an earlier result, but as an invitation to further explore the differences between the original
experiment and its replication. A classic model for such exploration is the joint design of
crucial experiments by Latham et al. (1988) (also see the ‘adversarial collaboration’ of
Mellers et al., 2001, and Koole & Lakens, 2012, for a recent endorsement of this approach).
The real value of exact replications (non-replications in particular) lies in the fact that they
draw attention to what Kepes and McDaniel call (after LeBel & Peters, 2011) “methodrelevant beliefs”, or to be precise: beliefs that are at once theory-relevant and method-relevant,
because they concern the manipulations that should theoretically produce certain effects and
the way these effects are measured. Rather than consigning a failed exact replication to the
file-drawer, as usually happens, or hailing it as a falsification, as Kepes and McDaniel
propose, it should be treated as an interesting opportunity for the further development of a
theory, particularly the operationalization of its key variables. This strikes us as particularly
important in I-O psychology, in view of its proximity to the applied context. To be relevant
for organizations, I-O psychology needs theories that do more than enable post-hoc
explanations, but furnish effective interventions and precise measurement instruments.
Replication studies are essential to the practical value of our theories, because they force us to
attend to the reliability and precision of our experimental manipulations and measurements.
In sum, we agree with Kepes and McDaniel that we need more confidence in our
SURVEILLANCE IS NOT THE ANSWER
effects and effect sizes, but the best way to raise this confidence is not to create a culture of
surveillance and error avoidance, but to cultivate a work climate in which error management
happens through the open discussion of research practices. In such a culture, 'failed'
replications do not mean that ‘something is wrong,’ but rather that ‘something interesting is
going on.’
6
SURVEILLANCE IS NOT THE ANSWER
Notes
1
Which is why some people prefer the term 'direct replication' (Schmidt, 2009) or 'close
replication' (LeBel & Peters, 2011). We follow Kepes & McDaniel's usage here.
2
See also LeBel & Peters (2011, p.376)
7
SURVEILLANCE IS NOT THE ANSWER
8
References
Bargh, J., Chen, M., & Burrows, L. (1996). Automaticity of social behavior: Direct effects of
trait construct and stereotype activation on action. Journal of Personality and Social
Psychology, 71, 230–244.
Bartlett, T. (2013). Power of suggestion. The Chronicle of Higher Education. Retrieved on
March 04, 2013, from http://chronicle.com/article/Power-of-Suggestion/136907
Collins, H. M. (1985). Changing order: replication and induction in scientific practice.
London etc.: SAGE Publications.
Doyen, S., Klein, O., Pichon, C.-L., & Cleeremans, A. (2012). Behavioral priming: It’s all in
the mind, but whose mind? PLoS ONE, 7(1), e29081. doi:
10.1371/journal.pone.0029081
Fanelli, D. (2013). Redefine misconduct as distorted reporting. Nature, 494(7436), 149–149.
doi: 10.1038/494149a
Francis, G. (2012). The psychology of replication and replication in psychology. Perspectives
on Psychological Science, 7, 585–594. doi: 10.1177/1745691612459520
Frese, M., Brodbeck, F. C., Heinbokel, T., Mooser, C., Schleiffenbaum, E., & Thiemann, P.
(1991). Errors in training computer skills: On the positive function of errors. Human–
Computer Interaction, 6, 77–93.
Galak, J., & Meyvis, T. (2012). You could have just asked: Reply to Francis (2012).
Perspectives on Psychological Science, 7, 595-596. doi: 10.1177/1745691612463079
Giner-Sorolla, R. (2012). Science or art? How aesthetic standards grease the way through the
publication bottleneck but undermine science. Perspectives on Psychological Science,
7, 562-571. doi: 10.1177/1745691612457576
Keith, N., & Frese, M. (2008). Effectiveness of error management training: A meta-analysis.
Journal Of Applied Psychology, 93, 59-69. doi: 10.1037/0021-9010.93.1.59
SURVEILLANCE IS NOT THE ANSWER
9
Kepes, S., & McDaniel, M. A. (2012). How trustworthy is the scientific literature in I-O
psychology? Industrial and Organizational Psychology: Perspectives on Science and
Practice.
Koole, S. L., & Lakens, D. (2012). Rewarding replications: A sure and simple way to
improve psychological science. Perspectives on Psychological Science, 7, 608–614.
doi: 10.1177/1745691612462586
Latham, G. P., Erez, M., & Locke, E. A. (1988). Resolving scientific disputes by the joint
design of crucial experiments by the antagonists: Application to the Erez–Latham
dispute regarding participation in goal setting. Journal of Applied Psychology, 73,
753–772. doi: 10.1037/0021-9010.73.4.753
LeBel, E. P., & Peters, K. R. (2011). Fearing the future of empirical psychology: Bem’s
(2011) evidence of psi as a case study of deficiencies in modal research practice.
Review of General Psychology, 15, 371–379. doi: 10.1037/a0025172
Mellers, B., Hertwig, R., & Kahneman, D. (2001). Do frequency representations eliminate
conjunction effects? An Exercise in Adversarial Collaboration. Psychological Science,
12, 269–275. doi:10.1111/1467-9280.00350
Neuroskeptic (2012). The nine circles of scientific hell. Perspectives on Psychological
Science, 7, 643-644. doi: 10.1177/1745691612459519
Nosek, B. A., Spies, J. R., & Motyl, M. (2012). Scientific utopia: II. Restructuring incentives
and practices to promote truth over publishability. Perspectives in Psychological
Science, 7, 615–631. doi: 10.1177/1745691612459058
Schmidt, S. (2009). Shall we really do it again? The powerful concept of replication is
neglected in the social sciences. Review of General Psychology, 13, 90–100. doi:
10.1037/a0015108
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2012). A 21 Word Solution. SSRN eLibrary.
SURVEILLANCE IS NOT THE ANSWER
10
Retrieved from http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2160588
Simonsohn, U. (2012). It does not follow: Evaluating the one-off publication bias critiques by
Francis (2012a, 2012b, 2012c, 2012d, 2012e, in press). Perspectives on Psychological
Science, 7, 597-599. doi: 10.1177/1745691612463399
Van Dyck, C., Frese, M., Baer,M., & & Sonnentag, S. (2005). Organizational error
management culture and its impact on performance: A two-study replication. Journal
of Applied Psychology, 90, 128-1240.
Wicherts, J. M., & Bakker, M. (2012). Publish (your data) or (let the data) perish! Why not
publish your data too?. Intelligence, 40, 73-76.
Download