On the common origins of psychology & statistics

On the common origins of psychology & statistics Part One: The struggle against subjectivity Katja de Vries | Supervisor: Dr. Sacha Bem | Leiden University 2006 Illustrations on the front cover: • How the human soul sees what the hands feel. Reproduced from: R. Descartes, L'Homme, Paris, 1664. Retrieved May 23, 2006, from http://gallica.bnf.fr/ark:/12148/bpt6k574850/f153.item • Miniature of blindfolded Lady Fortune. Reproduced from: Augustine, La Cité de Dieu. Manuscipt made around 14001410, which belongs to the collection of the Dutch Royal Library. Retrieved May 23, 2006, from: http://www.kb.nl/kb/manuscripts/ On the common origins of psychology and statistics 1 Looking at my own long life, I find that the main allurements which led me on and on […] were preferences. The solutions were accidents. Karl Popper, A world of propensities, p. 26 Nous avons beau avoir l’impression d’un mouvement presque ininterrompu de la ratio européenne depuis la Renaissance jusqu’à nos jours, […] – toute cette quasi-continuité au niveau des idées et des thèmes n’est sans doute qu’un effet de surface. Michel Foucault, Les mots et les choses, p. 13-14. On the common origins of psychology and statistics 2 SUMMARY SUMMARY (a) Structure of this masters thesis This thesis consists of two parts. Although these two parts are highly interrelated, they can be read separately. The first part is my masters thesis in psychology, whereas the second part is my masters thesis in philosophy. The subject of the first part is the role of objective (in particular the frequentist variant) and subjective (in particular the Bayesian variant) statistical inference in, respectively, the statistical methodology applied in psychology and in psychological theories of cognition. The subject of the second part is the historical and philosophical intertwinement of the notions underlying statistical methodology and cognitive psychological theory, namely probability and rationality. The way in which these notions emerged in the seventeenth century is part of a conceptual change that entailed the textualization of the world: ‘Nature’ became the ‘Book of Nature’. Considered in the light of this seventeenth century conceptual change a new understanding of the ‘subjective’ and ‘objective’ interpretation of probability – as discussed in part one – is gained. (b) Some thoughts which play a major role throughout the whole thesis. An attempt is made to think about the relation between psychology and its statistical methodology. With regard to this relationship the question is raised why it is so difficult to think about it without getting trapped in a discourse of either idolatry of the statistical method or a romantic longing to a hermeneutical, non-statistical psychology. The aim of this thesis is to find how one could speak about statistics and psychology without being pushed in a pro- or contraposition. It is hypothesized that the relationship between psychology and statistics can be understood from a relationship in which it is grounded philosophically and historically: the relationship between rationality and probability. The relationship of the words ‘probability’ and ‘rationality’ is that they seemingly cannot life without each other, nor with each other. Accordingly, the mutual relationship between these two words has been since their very emergence rather opaque and subject to shifts in meaning. In order to show the major conceptual changes within the relation between rationality and probability, two historical periods are examined: (a) the second half of the seventeenth century (the emergence of probability and the beginning of the period called On the common origins of psychology and statistics 3 SUMMARY classical probability) and (b) the second half of the nineteenth century / the first decades of the twentieth century (the beginning of modern probability): (a) Classical probability: In the first half of the seventeenth century Descartes tried to conquer the prevailing scepticism of his time and gain certain knowledge, whose seat he placed in the rationally thinking ‘subject’ – however, one could contend that Descartes failed to conquer sceptical uncertainty and that he even deepened the gap between human knowledge and the objective world. The emergence of probability in the second half of the seventeenth century has to be understood as an answer to this failure of Cartesian rationalism to gain certain knowledge. Probability – ‘uncertain rationality’ or ‘rationalisme manqué’– became the ‘calculus of reason’: it endeavoured to be rational and gain almost certain knowledge by the incorporation of uncertainty. Probability was the rational guarantee that subjective human knowledge corresponded to the objective world. This idea was supported by associationist psychology. However, is ‘uncertain rationality’ rationality at all? After the French revolution in 1789 the rational aspect of probability became more and more discredited. (b) Modern probability: From the 1840s on there is a major change in the meaning of rationality, probability and their relationship: probability is no longer the calculus of reason. Moreover, probability becomes divided into subjective and objective probability. Classical probability appears in retrospect an ambiguous amalgam of subjective as well as of objective probability. Objective probability (in particular its frequentist variant), which sees probability as the measure of the relative frequency of occurrence of events, becomes dominant and it still is in statistical methodology. In the second half of the nineteenth century subjectivity is detached from rationality and probability: subjectivity will be above all things the realm of romantic irrationality. However, subjective probability (in particular its Bayesian variant), which sees probability as an attribute of our subjective beliefs, will make some quite unexpected comebacks: the first comeback is between 1901 and 1910 in Cambridge, and the second comeback begins in the 1960s. This outbreak of the subjective interpretation in the 1960s has had a profound influence on psychological theories of cognition. Bayesianism has become in psychological theory an important model against which the rationality of human cognition is measured. There is a historical congruence between eighteenth century associationist psychology and twentieth century Bayesian cognitive models. Philosophically subjective statistical inference – which is used as a model of cognition in psychological theory – and the psychological methodology that consists mainly of objective statistical inference are not easily to reconcile, because they entail two completely different On the common origins of psychology and statistics 4 SUMMARY epistemologies. However, this philosophical incommensurability is hardly noticed, because even objective statistical inferential methodology is blurred by a veil of subjective semantics. This veil of subjective semantics obscures the relation between psychology and statistics. On the common origins of psychology and statistics 5 CONTENTS CONTENTS SUMMARY 3 CONTENTS 6 CHAPTER 1: WHAT IS THIS THESIS ALL ABOUT? 8 [1.A] Three questions concerning the relationship between psychology and statistics 9 [1.B] Method 13 [1.C] Hypothesis: both psychology and statistics bear the marks of an outdated subjectivism 19 [1.D] Recapitulation & outlook on the next chapter 21 PART ONE: STATISTICS AND PSYCHOLOGY. THE STRUGGLE AGAINST SUBJECTIVITY. 22 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ 23 [2.A] Misconceptions about statistics & the necessity of a terminological and historical overview. 24 [2.B] The omnipresence of probability and statistics. 25 [2.C] Making sense in the pile of words: probability, statistics and statistical inference. 26 [2.D] The remarkable lack of probabilistic thinking until the second half of the seventeenth century 39 [2.E] From seventeenth century ‘probability’ to eighteenth century ‘Statistik’ and nineteenth and twentieth century ‘statistical inference’. 48 [2.F] Recapitulation & outlook on the next chapter CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? 65 66 [3.A] Probability entangled in subjective semantics and the problem of induction. 68 [3.B] Probability freed from subjective beliefs: rationality without a knowing subject. 70 [3.C] Statistics à la Popper: the natural selection of a falsifying rule for statistical hypotheses. 78 [3.D] Statistical methodology as a re-enactment of evolution: tamed variation and accelerated selection 82 [3.E] The lure of ‘subjective’ semantics in statistical inference. 103 [3.F] The lure of ‘subjective’ semantics in cognitive psychology. 109 [3.G] Recapitulation & outlook on Part Two 119 INTERMEZZO: CONCLUSIONS OF PART ONE 120 On the common origins of psychology and statistics 6 CONTENTS PART TWO: THE INTERTWINEMENT OF THE NOTIONS UNDERLYING STATISTICS AND PSYCHOLOGY: PROBABILITY AND RATIONALITY 122 CHAPTER 4: THE LATENT PERIOD 123 [4.A] It all begins with Descartes – the rational, representational consciousness. [4.B] The book of Nature, epistemological uncertainty and the equivocation of the Cartesian ‘subject’. [4.C] Associationist psychology or the unproblematic ambivalence in the concept of probability [4.D] Rationality – between the Principle of Sufficient Reason (Leibniz) and the Principle of Nonsufficient Reason (J. Bernoulli) CHAPTER 5: THE CLASH WITH ‘NATURE’: THE LOCUS OF RATIONALITY RECONSIDERED. 123 [5.A] The transition from classical to modern probability – the same probability, but in a different way. [5.B] The nineteenth century confrontation with 'nature' – an objective interpretation of chance. [5.C] Aversion of statistics and love of absolute chance – Nietzsche's absolute subjectivism [5.D] Just a name? The realism of C.S. Peirce and the nominalism of Pearson CHAPTER 6: WHERE DO I STAND? THE SIGNIFICANCE OF STATISTICS 124 [6.A] Fechner and Peirce: the Kollektiv as an end in itself. [6.B] The semantics of statistics. [6.C] From “metaphysics” to “prophysics” REFERENCES 125 On the common origins of psychology and statistics 7 CHAPTER 1: WHAT IS THIS THESIS ALL ABOUT? CHAPTER 1: WHAT IS THIS THESIS ALL ABOUT? What is this thesis all about? In this chapter (A) some rather remarkable features of the relationship between psychology and statistics are introduced, (B) the method that is used in this thesis is elucidated, and (C) a hypothesis about the relation between statistics and psychology is formulated. On the common origins of psychology and statistics 8 CHAPTER 1: WHAT IS THIS THESIS ALL ABOUT? [1.A] Three questions concerning the relationship between psychology and statistics The relationship between psychology and its statistical methodology is a remarkable one. That may not seem obvious immediately – the statistical methodology is the scientifically accepted method in almost every domain, and the psychological research field is no exception: so what’s the problem? Question I: Is statistics a methodology that has been adopted from elsewhere or is it intrinsically linked with psychology? Psychology students usually see statistics as a hardship that has to be taken in order to reach the luring Cockaigne of becoming a scientific psychologist: “For seven years, you know well, he must wade in pig's dung all the way up to the chin, in order that he shall attain the land” (Lucas, 1995). Of course: nobody will deny that statistical methodology is a necessary tool in psychological research. Convincing statistical evidence is not only a sure way to win an academic dispute, but also the fuel that has made psychology produce useful knowledge. However, although a rational scientist will prefer a thesis that is well-founded by statistical arguments to a thesis that lacks a statistical foundation, there are still often some uncanny feelings about the eagerness of psychologists to use statistics. One time or another every psychometric researcher probably will have the gut feeling that his statistical quantification forces the psychological research object – that ‘wonderful complex mind’ – on a Procrustean bed (Michell, 1999; Thompson, 2001). Yet statistical methods and psychology seem to have belonged already from their beginnings inextricably together: statistical methods entered into psychology in 1860 through the psychophysical research of Gustav Fechner and by the 1880s the success and acceptance of this statistical approach was settled down – while in other social sciences, like economy and sociology, the statistical methods entered only thirty or forty years later (Stigler, 1999, p. 189). When asked why psychology has a relatively great amount of statistics in its curriculum in comparison with other social sciences, two answers are frequently heard. The first answer is that the function of statistical methodology – i.e., in particular the much used significance testing – in psychology is to ‘cover up’ the lack of theoretical understanding: Significance tests are for situations where we do not understand, in any theoretical sense, what is happening. […] Experimental psychology has become the heartland of significance testing. This fits our On the common origins of psychology and statistics 9 CHAPTER 1: WHAT IS THIS THESIS ALL ABOUT? paradigm. We understand, in a deep theoretical way, almost nothing about human psychology. So we do lots and lots of purely empirical experiments. We design experiments, obtain results, and quote significance levels. (Hacking, 2001, p. 216) The second frequently heard answer is that psychology is a relatively young science that wants to prove itself in the scientific arena (e.g. Bem, 2005). This way the important role of statistics in psychology is explained in an almost Freudian manner as ‘physics envy’ (Rogers, 2002). Viewed in this way, psychology appears to be in fact even more Catholic than the pope: the null hypothesis or significance testing that proliferates in psychological research is in physics hardly ever used (Gigerenzer, 1987a, p. 25 and p. 29; Gigerenzer & Murray, 1987, p. 179-180; Gigerenzer et al., 1990, p. 211). Or to put it differently: one can imagine to a certain degree physics without statistics. But can one imagine a psychology ‘uncontamined’ by statistics? One wavers between the thought of a psychology that is contaminated by a statistical methodology that is originally foreign to it and the thought of a psychology that is intrinsically intertwined with its statistical methods (Rogers, 2002; Thompson, 2001). Well, psychology did not adopt statistical methodology from the exact sciences, but it did adopt something else: whereas the probabilistic revolution in physics led to the rejection of classical physics, the introduction of statistical methods in psychology on the contrary to the adoption of the ideals of classical physics, viz., determinism and objectivity (Gigerenzer, 1987a, p. 22, p. 25 and p. 29; cf. Porter, 2003a; Porter, 2003b). That probability seemed to imply uncertainty clearly discouraged its use by physicists. From the standpoint of social science, on the other hand, statistical method was synonymous with quantification, and while some were skeptical of the appropriateness of mathematics as a tool of sociology, many more viewed it as the key to exactitude and scientific certainty. Most statistical enthusiasts simply ignored the dependence of statistical reasoning on probability, and those who acknowledged it generally stressed the ties between probability and that most ancient and dignified among the exact sciences, astronomy. (Porter, 1986, p. 10) So psychology and physics do not only differ in the area of their theoretical worldview in relation to probability, but also in their appraisal of statistical inference. The statistical inferences made on the base of probabilistic axioms by psychologists – as “an indication of the precision of their results” (Heiser, 2003, p. 268) – are viewed argus-eyed in the exact sciences and are mostly thought of as a weak extraction of the calculus of probabilities. On the common origins of psychology and statistics 10 CHAPTER 1: WHAT IS THIS THESIS ALL ABOUT? Question II: Why is statistical methodology in psychology presented as a monolithic, timeless unquestionable truth? How statistics is presented in psychology (and in other social sciences) makes one’s eyebrows raise: as an “abstract truth, the monolithic logic of inductive inference” (Gigerenzer et al., 1990, p. 106). I think that the methodological variability in twentieth century psychology, which is emphasized by Dehue (1990) pales into insignificance in comparison with the overwhelming tendency to an uniformized methodology: By 1955, more than 80% of the articles in four leading journals from four different areas of psychology used significance tests to justify conclusions from the data. […] Today, the figure is between 90 and 100%. (Gigerenzer et al., 1990, p. 206) Of the continuing debates among statisticians barely any trace can be found in the textbooks that teach statistics to the students in the social sciences. In their courses on statistics, psychology students are often taught a seemingly unified theory which actually is a hybrid variant of several theories, whose concepts are irreconcilable or at least do not blend very well (Gigerenzer et al., 1990, p. 106). Therefore it is not very surprising that the conceptual understanding of commonly used statistical procedures is quite weak, as is shown by the answers given to a questionnaire that Gigerenzer, Krauss & Vitouch (2004) recently presented to students and teachers in psychology about the meaning of a significant result. Moreover, I do not think that a psychological researcher when using SPSS or some other statistical package is very concerned by theoretical issues. Although most psychologists will have a certain idea of what e.g. Poppers falsificationism (Bem & Jong, 1998) means, they would find it probably rather difficult to answer the question what Poppers position was in the controversies on statistics and probability and to appraise what the relevance is of his propensity theory (Gillies, 2003) for the statistical analysis they are running. Yet, exciting things could be said for instance about the relation between Fisher’s Design of experiments (Fisher, 1951) and Popper’s Logic of Scientific Discovery (Popper, 1972): respectively the practical and the theoretical book that where both published in 1935 and furnished the methodological ground still in use in psychology – both insisted “that a null hypothesis can only be shown implausible, and never be shown plausible” (Gigerenzer et al., 1990, p. 96). However, if psychologists attend any courses in the history and philosophy of psychology at all, these courses tend to focus on the historical succession of theories of the On the common origins of psychology and statistics 11 CHAPTER 1: WHAT IS THIS THESIS ALL ABOUT? mind and practically never draw any attention to theoretical and historical questions concerning the statistical method such as, e.g.: “What does the ‘Normality’ as assumed in so many statistical procedures actually mean, am I a victim of the ‘Myth of Normality’ and what has Fisher to do with it?” (Gigerenzer et al., 1990, p. 114) or “Why has one of the main mechanism in statistics – regression – such a gloomy, Darwinistic name?” (Heiser, 1990). In this thesis I will answer these questions and show why it is important to show future psychologists that statistics is a historical phenomenon, embedded in theoretical implications. In the seventies and eighties of the twentieth century there was a real explosion of popular scientific literature (Gribbin, 1985; Monod, 1970; Prigogine & Stengers, 1985) linked with the indeterminism that is implied in some statistical interpretations in physics and biology of the end of the nineteenth and beginning of the twentieth century, which today have become quite mainstream. The question what probability is or how it has to be interpreted is a deeply philosophical question that is related to the question ‘Do we live in an indeterministic world or not?’. However, the probabilistic axioms of the statistics in the methodological textbooks are presented just as given facts without a lot of theoretical concerns. Question III: Who is in control: psychological theory or statistical method? It is not customary in psychology to neglect controversial issues. When somebody asks me about some domain of psychology if I could name a leading thinker or theory of it, I almost always have to come up with a nuanced answer like: “Well, X is a proponent of theory A and he is supported by studies K,L and M, but there are also several studies that say the opposite and meta-analysis of the main studies in the field showed the theory of X more probable”. So you could say that psychology is a method-driven science: not because there is a lack of psychological theories – on the contrary: there is an abundance of them! – but because the arbiter that judges on the value of each psychological theory is in the end the statistical method that itself is presented as an unity without inner controversies or competitive alternative theories. It was the adoption of the methodology of statistical inference that marked the emergence of a unified research practice and paradigm in psychology (Danziger, 1985; 1987, p. 46). One would expect a very deep level of understanding of methodology in a science wherein method takes such a central position. However, statistical methodology is on a theoretical level notoriously poor understood by its applicants – although the statistical algorithm is correctly applied, an often heard complaint is that the applicant of statistics does not exactly grasp what he is doing. On the common origins of psychology and statistics 12 CHAPTER 1: WHAT IS THIS THESIS ALL ABOUT? [1.B] Method Every scientific psychological publication has a methodological section. Because this is a thesis in the philosophy and theory of psychology that deals with the relationship between psychology and its methods it would be circular reasoning to use the methods commonly used by scientific psychologists. It is therefore obvious that the approach in this thesis will not be statistical. This methodological section is divided in the following subsections: I. A philosophical attempt to think about statistical thinking. II. Statistical discernment, psychological discernment & philosophical discernment. III. Implicit assumptions and the historical-etymological approach IV. A final demarcation: a philosophical exploration of the (shared) assumptions on rationality in cognitive psychology and statistics – no more, no less. V. Philosophical position I. A philosophical attempt to think about statistical thinking. After reading the previous section the reader could have the misapprehension that he will be reading a philosophical critique of psychological research or philosophy of science, i.e. a critical opinion on what the philosophical understanding of the method of psychology should be. Let me be clear on this: this thesis is not an attempt to point out a theoretical shortcoming in psychological research. After all, from a pragmatic point of view one could say: “So psychologists use a methodology without hardly any historical or theoretical concerns about this methodology and the fact that this methodology that is artificially presented as a unity – but so what? Psychology students are trained to be psychological researchers, not historians or philosophers. As long as they use their methodology well – why bother?”. And I would agree with this practical point of view: the psychological scientific enterprise can run perfectly on its statistical fuel without any theoretical or historical thought. Of course, one could use historical or theoretical remarks as a way to substantiate a proposal to a technical improvement of a certain statistical method, e.g. to promote that not only the significant results of a significance test have to be mentioned, but also its power (Gigerenzer et al., 2004). However, this thesis is not a methodological critique either, and there will not be presented any ‘alternatives’ to the statistical methodology in use. Actually, I tried to avoid On the common origins of psychology and statistics 13 CHAPTER 1: WHAT IS THIS THESIS ALL ABOUT? technical issues as much as possible and statistical formulas will be hardly found in this thesis. Well, is it maybe a historical or sociological work? No – although I probably feel more affiliated with those historians and sociologists of statistics who approach its history with an emphasis on an externalist view as a social process (e.g., Theodore Porter) than with those who approach it with an emphasis on the internalist view as autonomous history of scientific ideas (e.g., Stephen Stigler), the aim of this thesis is not to give a complete historical or sociological account of the position of statistics in psychology (see for the distinction of different historical approaches: Desrosières, 1993; Rosser Matthews, 2000; Swijtink, 2000). The long historical detours serve only as a way to create space for a philosophical attempt to think where no thought seems necessary: it is exactly the almost unquestionable and unreflected superiority of the statistical methodology in psychology that made me wonder and moved me to write this thesis. Why is the superiority of statistical methodology so unquestionable and unreflected? When one tries to say something about statistical methodology it is easy to get trapped in a discourse of either positivistic idolatry of statistics or a romantic longing to a hermeneutical, non-statistical psychology. Therefore we have to ask ourselves what makes it so hard to think about statistics, without being pushed in a pro- or contraposition. I think the reason that makes it so hard to think in a sober, empirical way about statistics is the fact that statistical methodology is a way of thinking itself: when one tries to think about statistics there always is a certain ‘duplicity’, because one tries to think about thinking. And what is more: one tries to think about a way of thought that is so powerful, that it is hardly possible to withdraw one’s own (philosophical) way of thinking from it. A philosopher who endeavours to think about statistical thought wavers between Scylla and Charybdis. If a philosopher says: “Well, I personally find this statistical way of thinking a rather limited way of thinking, overlooking a lot of qualitative aspects of life”, he himself overlooks the enormous power of statistical thought that has become omnipresent in our modern lives. The thought of the philosopher that he can simply ‘reject’ or ‘criticize’ statistical thought will be ‘harmless’ and ‘marginal’, because it did not grasp the impact and amplitude of this way of thinking. However, if the philosopher does acknowledge the power and superiority of the statistical way of thinking, he will not be able to resist to the idea that the best way to think about statistics is therefore through statistical thinking: and he will feel himself consequently obliged to do statistical research about the statistical way of thought. So how should one think philosophically about statistical thought? On the common origins of psychology and statistics 14 CHAPTER 1: WHAT IS THIS THESIS ALL ABOUT? II. Statistical discernment, psychological discernment & philosophical discernment. Notwithstanding that one can make quite often some comment on technological flaws in the statistical technologies used, it is anachronistic to deny the overwhelming power to discern that is added to the naked eye by the use of statistical inferential techniques. It is easy to forget how e.g., the analysis of variance has made it quite easy to say if it is highly improbable or not that a certain treatment has an effect. When the founding father of the analysis of variance, Ronald Aylmer Fisher arrived in 1919 at the Rothamsted Agricultural Experimental Station it seemed impossible to infer from the data gathered during 90 years of experiments if the fertilizer used had some effect on the crop yields or not. The variation measured seemed to be inconsistent and equivocal, and it was unclear which effects had to be ascribed to the fertilizer and which to other factors like e.g. rainfall, the presence of weeds and soil type (Fisher Box, 1978; MacKenzie, 1981; Salsburg, 2001). In the decade that followed on Fishers arrival on Rothamsted he solved this problem and the analysis of variance was born (Fisher, 1921, 1924). Due to this method of analysis of variance it is today quite easy to discern – even when to the naked eye the data seem to be an inextricable tangle of variation – if it is improbable that the variation in a given sample is due to chance or not; in this way it is possible to partition the variation among various factors and infer if a certain treatment produces an effect. As such, modern statistical methodology is a new way to discern, reason and know. Yet, it stands in a long tradition of Western thought concerning the question how to discern order amidst of chaotic variation. Plato stands at the beginning of this long metaphysical tradition that discerns the visible, changeable world from the eternal intelligible world of ideas. What is it that makes us recognize a horse as a horse – despite the differences that exist among particular horses? “It is the idea of a horse, the ‘horseness’, that enables us to recognize a horse as a horse”, Plato answers. But the ability of a modern scientist to discern horses goes a lot further then that of Plato: e.g. the biostatistician is perfectly able to perform an analysis of variance on the the expression level of a certain gene (indicated by the intensity of the fluorescence signals on a DNAmicroarray) in the Equus Caballus (domesticated horse) and the Equus ferus przewalskii (wild horse) and conclude a significant difference between them; and a psychologists can perform an analysis of variance to discern if laypeople are able to discern between an ordinary Equus Caballus and an Equus ferus przewalskii or if their ability to discern these horses is affected by alcohol consumption. On the common origins of psychology and statistics 15 CHAPTER 1: WHAT IS THIS THESIS ALL ABOUT? Both the biostatistician and the psychologist use the same statistical methodology to discern between structural and accidental variation; yet in the psychological research the word ‘discernment’ not only relates to the research method, but also to the research object. Moreover, it is evident that the statistical methodology in psychology has had repercussions on the way the psychological research object – human, non-scientific cognition – is understood: with the help of scientific statistical methods the psychologist may try to gain insight in the workings of the ‘intuitive statistician’ (Gigerenzer & Murray, 1987). Philosophically the relation between ‘scientific’ and ‘intuitive’ statistical discernment in psychology raises the question how these two relate, what criterion demarcates them and how one should approach the circularity entailed by ‘discerning about discernment’ or ‘thinking about thinking’. These philosophical questions will of course not be of no great concern to the daily practice of a scientific cognitive psychologist. The only thing which matters to a experimental psychologist as well as to every other scientist applying statistical methods is the fact that their experimental data (i.e. on genetic differences, on crop yields, or on the ability to discern horses under different circumstances) are a lot more meaningful when analysed with a statistical analysis. However, from a philosophical standpoint it is of great interest that cognitive psychology as well as statistical methodology have implicit assumptions about ‘discerning’, ‘knowing’, ‘thinking’ or ‘cognition’: assumptions that are implied in our understanding of the words ‘rationality’ and ‘probability’. Since both statistical thought and cognitive psychology rely on certain assumptions concerning ‘probability’ and ‘rationality’, the comparison of these shared assumptions may provide a possibility to think about thinking. Whereas it is hard to find a starting-point to think philosophically about statistical thinking, it possibly might be possible to think (philosophically) about how the underlying assumptions of the scientifically generally accepted way of thinking (statistical thought) relate to the underlying assumptions of how psychology thinks that we think (psychological view on human cognition): i.e., to ‘compare’ the role of the notions ‘probability’ and ‘rationality’ in statistical thought and psychological thought. Are these assumptions the same for cognitive psychology as well as for statistical methodology? Or do they differ? On the common origins of psychology and statistics 16 CHAPTER 1: WHAT IS THIS THESIS ALL ABOUT? III. Implicit assumptions and the historical-etymological approach In every course on statistics students will be told that they may use certain statistical methods only when certain assumptions – such as e.g. that the data were drawn from ‘a normally distributed population’ or from two populations with the same spread of scores – are met. These assumptions do not seem to concern the question of what the statistical method assumes about how the human thought has access to and knowledge of its surrounding world – or do they? Implicit assumptions are assumptions that seem so self-evident, that I think the only way to bring them to light is by a historical or semantic-etymological approach: after all, when it can be shown that certain assumptions have a historical origin it may shed a different light on their presumed self-evident nature and it may become possible to see their philosophical meaning. I endorse therefore the philosophical approach of Hacking, when he says: There is an anti-positivist model which, for all its obscurity, may […] have its appeal. We should perhaps imagine that concepts are less subject to our decisions than a positivist would think, and that they play out their lives in, as it were, a space of their own. […] In the past 300 years there have been plenty of theories about probability, but anyone who stands back from the history sees the same cycle of theories reasserting itself again and again. (Hacking, 1975, p. 15) IV. A final demarcation: a philosophical exploration of the (shared) assumptions on rationality in cognitive psychology and statistics – no more, no less. Of course, one can not expect to find in this thesis a complete historical account on the history of statistics, nor a exhaustive treatment of all philosophical aspects or an extensive linguistic study on the etymological development of words related to probability and statistical methodology. Moreover, I will restrict myself to ‘orthodox’ cognitive psychology for the sake of clarity. I will not make any digressions about areas such as social psychology or clinical psychology. I will restrain myself of digressions on any anti-statistical, hermeneutical movements in psychology either, because they have had (except for the Skinnerian movement) scarcely any effect if any on scientific practice. Nor will I go into measurement theoretical issues that question the possibility of measuring psychological concepts (see e.g. Michell, 1999): I think that the question if abstract concepts (such as “attention” or “commitment”) can be expressed On the common origins of psychology and statistics 17 CHAPTER 1: WHAT IS THIS THESIS ALL ABOUT? in numbers is a question that every scientist always has to ask himself – but in the end this is a practical question that can be restated as: “ Is my measurement level yielding useful and sound results”? All the historical, etymological and philosophical deliberations to be found in this thesis only have one aim: to awake us from our thoughtlessness about the seemingly selfevident assumptions that may be hidden in statistical methodology and cognitive psychological science on the relation of human thought and ‘the-world-as-it-is’ and to provoke a philosophical wondering about them instead. The focus of attention will be constantly shifting between cognitive psychology and statistics. Each time the leading question will be on which points the assumptions on rationality and probability that are made in them are the same and on which points they diverge. V. Philosophical position Thoughts do not appear out of the blue. Nor do the thoughts in this thesis. There are several thinkers who I would like to name in particular, because it will give the reader of this thesis an idea of what he might expect. I owe my vision upon falsification, the impossibility of induction and the objectivity of probability to Karl Popper; from Ian Hacking I learned to know the Janus-faced character of probability; I am indebted to Gerd Gigerenzer for the idea of how the statistical method has developed into a metaphor in psychological theories of the mind; Lorraine Daston made me see the link between classical probability and associationism; Martin Heidegger has been decisive in shaping my thoughts on the role of the ‘subject’ in Western thought. On the common origins of psychology and statistics 18 CHAPTER 1: WHAT IS THIS THESIS ALL ABOUT? [1.C] Hypothesis: both psychology and statistics bear the marks of an outdated subjectivism The statistical-research practice in psychology is a success (cf. Cowles, 1989). If a scientist wants to do psychological research, statistical methodology is practically a sine qua non. No other methodology can compete with the success of statistics. Yet, the level of conceptual understanding of statistics among psychologists seems to be rather low. How can this be? In part one of this thesis I will defend the hypothesis that the many conceptual misunderstandings about statistics are rooted in the tendency among psychologists to interpret probability (whose assumptions underlie the statistical inferential methodology) subjectively. In order to substantiate this hypothesis the role of subjective (in particular Bayesian) and objective (in particular frequentist) statistical inference in statistical methodology and psychological cognitive theories will be studied in detail. Moreover, it will be shown how a lot of the contemporary cognitive theories are build upon the idea that subjective probability can be seen as a standard against which one can measure human rationality. I will argue that from a philosophical point of view the subjective and objective interpretation cannot be reconciled, because they endorse two different epistemological positions: respectively, classical epistemology and evolutionary epistemology. Yet – although the frequentist statistical methodology and its underlying evolutionary epistemology show that rationality and probability have nothing to do with my personal, Cartesian subjective beliefs – it is my thought and my words which are affected by it. Statistics and probability affect how I discern and how I know. How can I say how probability and statistics concern me – without relapsing in Cartesian, subjective talk? This will be the central question of part two. The hypothesis, which will be proposed in order to answer this question, is that there was a seventeenth century conceptual turn which entailed the textualization of the world (cf. Hacking, 1975): ‘nature’ became the ‘book of nature’. As Galileo wrote famously in the Assayer in 1623: Philosophy is written in this grand book - the universe - which stands continuously open to our gaze. But the book cannot be understood unless one first learns to comprehend the language and interpret the characters in which it is written. It is written in the language of mathematics, […]. (Galileo, 1957, p. 237) On the common origins of psychology and statistics 19 CHAPTER 1: WHAT IS THIS THESIS ALL ABOUT? Subsequently I will argue that within this seventeenth century textualization of the world the notions of rationality and probability emerge in such a way that they entail the latent beginnings of both psychology and statistics. This latent period will last until the nineteenth century, wherein the notions rationality and probability will change in a radical way. This change will lead to the nineteenth century emergence of psychology and its statistical inferential methodology as we know them now. To substantiate this hypothesis I will focus on: (1) how the predecessors of psychology (viz., the seventeenth century Cartesian rational subject and eighteenth century associationist psychology) were entangled with the predecessors of modern statistical inferential methodology (viz., seventeenth and eighteenth century probability theory). (2) how the nineteenth century change of the notions ‘probability’ and ‘rationality’, on the one hand made it possible for psychology and frequentist statistical methodology – both useful and fruitful disciplines – to emerge, but on the other hand entailed rather unpleasant epistemological consequences which lead to hermeneutical, i.e. anti-statistical and anti-psychological, reactions in nineteenth century thought. I will try interpret these two periods, i.e. the classical probability of the seventeenth and eighteenth century and the modern probability of the nineteenth century until our present days, in the light of its origin: the textualization of the world. This may put the question how probability and statistics concern me in a completely different light. On the common origins of psychology and statistics 20 CHAPTER 1: WHAT IS THIS THESIS ALL ABOUT? [1.D] Recapitulation & outlook on the next chapter Recapitulation of chapter 1: The relation between psychology and its statistical methodology, i.e., statistical inference, has some remarkable features: (a) statistical methodology is intrinsically linked to psychology but at the same time seems to be a ‘contamination’ from the exact sciences; (b) statistical methodology is presented as a timeless, monolithic truth – which it is not! – and as such it unifies psychological research that itself lacks a coordinating global theory; (c) one would expect a very deep level of understanding of methodology in a science wherein method takes such a central position: however, statistical methodology is on a theoretical level notoriously poor understood by its applicants. It is assumed that these aspects that characterize the relation between statistics and psychology can be thought of on a philosophical level if their relation to the underlying notions ‘rationality’ and ‘probability’ could be clarified. The meaning of this notions and their relationship in two historical periods (seventeenth / eighteenth century classical probability and nineteenth / twentieth century modern probability) will be explored. Outlook on chapter 2: A historical and terminological overview of the words ‘probability’ and ‘statistics’ is presented. There is a remarkable lack of probabilistic thinking until the second half of the seventeenth century. It will take another two centuries before the probabilistic calculus will lead to the development of inferential statistics: until the end of the nineteenth century practically no inferential statistics existed, only descriptive statistics. Although there is a great consensus about the probabilistic calculus, on a theoretical level it is unclear if probability has to be interpreted as a subjective or an objective phenomenon. This debate had also its repercussions on inferential statistical methodology, which is based on probabilistic assumptions. On the common origins of psychology and statistics 21 PART ONE: ONE: Statistics and Psychology. Psychology. The struggle against subjectivity. On the common origins of psychology and statistics 22 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ In our exploration of the relation of psychology and statistics we will put ‘psychology’ for a moment aside (I will return to psychology in the next chapter, i.e., chapter 3) and concentrate ourselves first on the word ‘statistics’. Stop reading for a moment and ask yourself: “What is statistics?”. To grasp the drift of the word ‘statistics’ is not an easy matter. In order to understand its philosophical impact it will be necessary to clarify its relation to the ‘underlying’ notions ‘probability’ and ‘rationality’. In this chapter I will aim to show the relation between the word ‘statistics’ and the word ‘probability’. The clarification of the relation of the words ‘statistics’ and ‘probability’ to the word ‘rationality’ will be largely postponed to chapter 4 and 5. I only will touch upon the relation between ‘probability’ and ‘rationality’ very superficially in section (D) and section (E) of this chapter. The elucidation of the relation between the words ‘statistics’ and ‘probability’ will take some long historical and etymological detours. In the first two sections of this chapter I will explain why it is necessary to approach these words in such a circumstantial way: in section (A) I will present some misconceptions about statistics and in section (B) I will show the omnipresence of probability and statistics. In section (C) a terminological overview of the words ‘probability’, ‘statistics’ and ‘statistical inference’ is presented. The last two sections have a historical character. Section (D) goes back to the very beginnings of probability and its emergence in the second half of the seventeenth century. The last section, viz., section (E) shows how seventeenth century ‘probability’ developed into eighteenth century ‘Statistik’ and nineteenth and twentieth century ‘statistical inference’. A central theme in this whole chapter will be the difference between the subjective and objective interpretation of probability. On the common origins of psychology and statistics 23 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ [2.A] Misconceptions about statistics & the necessity of a terminological and historical overview. It is stunning how many misconceptions exist about statistics. One of the most popular misconceptions among lay people is that statistical methodology is just quantification and counting. Another well known fallacy is the idea that everything in the world is normally distributed – a distortion of Bernoulli’s central limit theorem. And even when people have followed some courses in statistical methodology and are perfectly capable to perform certain statistical operations (to calculate a standard deviation, to perform a t-test, etc.) they often will have only a hazy idea of the meaning of these statistical operations: it is especially due to the introduction of user-friendly statistical computer packages that it has become much easier to perform a regression analysis than to grasp the theoretical meaning of it. Yes, statistical misconceptions do abound – but in general they are misconceptions with obvious historical roots: after all, there has been a time that statistics was mostly ‘just counting’, because there were no generally accepted rules for statistical inference yet; there has been a time when scientists believed that the normal distribution was an almost divine law with which everything in this world has to comply; and there has been a time that the majority of statisticians had a complete wrong understanding of a statistical phenomenon like ‘regression’. I believe that the abundance of statistical fallacies that have prevailed and still prevail is not a consequence of the stupidity of psychology students, but has rather to be seen as a sign of the fact that statistical thinking is in a sense ‘unnatural’ and counter-intuitive. Therefore I think it is necessary – previous to all philosophical thinking about statistical methodology – to gain a proper understanding of some basic terms in statistics and what their mutual relationships are, e.g.: “How do statistics and probability relate to each other?”, “What is inferred from what in statistical inference?”, “What is the status of the Normal Distribution?”, etc. This chapter aims to clarify some of the often misunderstood issues in statistics and presents a guideline with regard to the terms and names to which I will often refer henceforth in this thesis. Because one can point out historical reasons for the majority of misconceptions concerning statistics, this orientating overview will have a historical emphasis. On the common origins of psychology and statistics 24 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ [2.B] The omnipresence of probability and statistics. I mentioned already in chapter 1 that statistical methodology thrives in almost every science and in psychology maybe even more than in most other sciences. But it is good to realize that statistics is not confined to scientific methodology. Probability and statistics are everywhere. The prominent philosopher on probability Ian Hacking (2004, p. 4) states that there are “more explicit statements of probability presented on American prime time television than explicit acts of violence (I’m counting the ads)”. You just have to turn the television on to see this: weather forecasters tell how much chance there is that it will rain tomorrow, advertisers claim that their detergent is 83% more effective than other cleaning products and sternly looking scientists estimate the probabilities of bird flu pandemics, global greenhouses and cancers. The ‘imperialism of probabilities’ (Hacking, 2004,p. 5) is not confined to the television screen. We all have insurances against all sorts of risk and insurance companies are the employee of many a statistician. The making of reasonable decisions has become equivalent with probability based decision-making: “No public decision, no risk analysis, no environmental impact, no military strategy can be conducted without decision theory couched in terms of probabilities” (Hacking, 2004, p. 4). Before stepping into our car after drinking 3 glasses of wine, we consider our odds of getting a fine or causing an accident. We weigh ourselves and hope that our weight does not exceed the norms of the Quetelet-index. We take multiple choice tests and our answers are statistically analysed. We take IQ-tests when applying for a job. We observe anxiously if our children’s ability and characteristics deviate from the average. In short: our lives are drenched with probabilities and statistics. On the common origins of psychology and statistics 25 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ [2.C] Making sense in the pile of words: probability, statistics and statistical inference. Probability and statistics are words that are often mixed up. The situation becomes even more complex because the word ‘statistics’ actually refers to two different phenomena: descriptive statistics, which in principle are unrelated to probability, and inferential statistics – i.e., the scientific methodology that makes it possible to draw general conclusions from a limited amount of observations and which is ‘build’ for a large part on probabilistic axioms. In the following sections I will clarify: (I) Probability: what ‘meaning’ or ‘interpretation’ can be given to probability, viz., subjective and objective interpretations; (II) Statistics: descriptive statistics and inferential statistics: why the emergence of statistical inferential methods was a great scientific breakthrough and how the lack of reliable inferential statistical methods sometimes had disastrous consequences; (III) Inferential statistics built on probabilistic axioms: how the different interpretations of probability lead to different forms of statistical inference, viz., Bayesian inference and frequentist inference. I. Probability When we throw a regular looking die, what is the probability of getting a 5? The calculation to answer this question is very straightforward. We count the possibilities of an event and divide this by the total number of possibilities, in order to obtain a number between 0 and 1: the less likely an event is to occur, the closer its probability will be to 0. So the probability of getting a 5 is quite obviously: 1/6. But what does this number 1/6 mean; i.e. what reason makes it seem so obvious that the probability is 1/6? This seems to be an easy question. Yet it has exercised many of the great minds of the twentieth century and also today the discussions rage on this still ‘unsolved’ issue. The literature on the interpretation of probability is overwhelmingly multitudinous and scattered. Still one can , roughly spoken, distinguish two camps whose adherents are traditionally called the subjectivists and the objectivists (Gillies, 2003). Both of these camps are subdivided in several subcamps and conglomerates of different camps, that may even encompass certain combinations of subjective interpretations as well as objective interpretations: sometimes it seems as if everybody presents his own particular interpretation of probability. On the common origins of psychology and statistics 26 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ However, the distinction in subjective and objective probabilities reflects a major distinction within the possible interpretations that can be given to probability, namely: is probability a ‘subjective’, epistemological phenomenon that has its roots in yourself (e.g., in your logic or your perception) or is it a ‘objective’ phenomenon that really exists – ontologically – in the ‘outer world’? If you say that a die has a probability of 1/6 to show a certain face, is the word probability then referring to your knowledge or to some really existing ‘attribute’ of the die? The subdivisions within the subjective and objective interpretation are endless. To avoid getting astray in the endless variants of interpretations of probability, I will elucidate here only one particular variant of the subjective interpretation – the so called personalistic interpretation – and one particular variant of the objective interpretation – the so called frequentist interpretation (Gillies, 2003); on the basis of these two variants the difference between the objectivist and subjectivist interpretation in general can be further clarified on a very basic level which will provide a sufficient understanding for the moment: (a) The subjective-personalistic interpretation says that probability is only an expression of ignorance of real causes – probability in itself does not exist. Due to your ignorance you have to guess. This guess is a subjective belief: when you for instance look at a die, you assign a certain probability (i.e., your subjective belief) to the occurrence of the five turning up. What is the best guess you can make? There can be plenty of little causes that influence the outcome where you have no knowledge of (e.g., the fairness of the die, the sweatiness of your hand, the subtle north-easterly wind, the smoothness of the surface of the table and the movements of the earth), but exactly because you have no knowledge of them you do not have any reason to assume that any of the six possibility outcomes has a differing probability. Thus you assume that the six faces of the die have an equal probability of turning up. However, when you would notice after throwing several times with the die that the face with the number 5 on it keeps turning up much more often than you expected at first, you could adjust your subjectively assigned prior probability of 1/6 to a higher probability – “Probably this die is biased”, you conclude. So, probability is in this interpretation a matter of purely personal degree of belief – Popper (1972, p. 148) calls this subjective interpretation therefore psychologistic – in a situation of epistemological uncertainty. You have a lack of knowledge and therefore you have to rely on probabilistic beliefs; when you would have no lack of knowledge there would be no probabilities either. Implicitly this view endorses therefore a deterministic worldview, because if you would have no epistemic uncertainty and On the common origins of psychology and statistics 27 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ would exactly know all the conditions influencing your die throw you would be able to make an exact prediction instead of a probabilistic one. So, an adherent of the subjective interpretation of probability would say that when you throw a die, there is no chance involved, but just a lack of knowledge. This creates a big problem for the subjective interpretation of probability: for if it’s true that probability is just a personal belief, then it is to a certain extent (of course their may be some practical reasons for choosing a certain belief) an arbitrary belief or at least a mere construction. And also logically it seems strange that one is creating knowledge from a lack of knowledge. So, is there really any logic in it if you assume that probability is purely subjective? Is probability just a palliative for your lacking knowledge or is there a objective situation (the shape of the die, the surface of the table on which it falls, the gravitational forces) that ‘produces’ a certain frequency of the number 5 turning up when a repetition of throws is made? Does it not make sense as well that the probability of 1/6 is also something that exists independent of our beliefs? When we are throwing a die in a black box where it cannot be seen, their probably would still turn out to be a certain stable frequency of occurrence of the face with the number 5 on it turning up, not? (b) The objective-frequentist interpretation sees probability as a phenomenon that can be scientifically measured and established: probability is “a statement about the relative frequency with which an event of a certain kind occurs within a sequence of occurrences” (Popper, 1972, p.149). So within the objective-frequentist interpretation probability is not just a belief resulting from a the lack of knowledge of a “specific individual” concerning “particular events” (Gillies, 2003, p. 89), but probability is instead a stable frequency of occurrence that manifests itself in a long sequence of repetitions. This fact – that probability is a stable frequency which manifests itself in a long sequence of repetitions in a measurable way – makes it scientific. Scientifically there is no necessity to pinpoint exactly the ‘meaning’, ‘cause’ or ‘reason’ of this frequency of occurrence; the observation that this probability ‘manifests’ itself and has useful applications is scientifically spoken sufficient. Probability is seen as an ontological ‘reality’ – independent of our knowledge – because it can be measured objectively. Von Mises (1883-1953), one of the first great adherents of the frequency interpretation of probability, compares probability with ‘length’: although it is also difficult to say philosophically what ‘length’ is, nobody will reproach a surveyor measuring land – a very useful and practical job – that he is just measuring a figment of his imagination (Mises, 1936, p. 36). On the common origins of psychology and statistics 28 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ “Denn in der Anwendbarkeit einer Theorie auf die Wirklichkeit sehe ich den wesentlichsten, wenn nicht einzigen Prüfstein ihres Wertes”. (Mises, 1936, p. 34) And the usefulness of probability is something that hardly can be doubted. The life-insurance company that gathers a great amount of data about the mortality of 41-years old German men can calculate the probability of mortality within this collective and do well out of it: it seems ridiculous to say that they are just measuring their ‘belief’ about the amount of 41-years old German men dying. As the French mathematician Poincaré (1854-1912) asked himself: “How could insurance companies make regular profits if there is no objective reality corresponding to their probability calculations?” (Gillies, 2003, p. 86). Insurance companies live on the so called ‘law of large numbers’: the plain fact that if you make many independent observations then the average of the sample is close to the average of the population and therefore stable and predictable (Moore & McCabe, 1999). So frequencies are real and it is science which can observe them: “In frequency theory, probabilities are associated with collections of events or other elements and are considered to be objective and independent of the individual who estimates them, just as the masses of bodies in mechanics are independent of the person who measures them” (Gillies, 2003, p. 89). One of the objections to the frequency interpretation is that a finite empirical collection (e.g., a sample of 41-years old German men) is represented in the mathematical theory by an infinite mathematical collective (Gillies, 2003, p.90), because “probability values are defined as the limit of an infinite sequence” (Cowles, 1989, p. 58). One could answer this objection by referring to the fact that this is something that occurs everywhere in physics and that the obtained frequency-ratio can be used as a hypothesis for the true value of the infinite collective and that this can be tested (Cowles, 1989, p. 58). Some people also object to the frequency theory that probability as a long run frequency seems quite a ‘mystical’ property. Objectivist probabilist have come up with several explanations why there is nothing ‘mystical’ about the statement that probability is a ‘frequency’ or a ‘attribute’ that arises in certain ‘chance setups’. However, these objectivist explanations are quite technical and therefore I will not go further in these matters. For reasonably comprehensible and convincing accounts I refer to Hacking (Hacking, 1976) and Popper (see e.g. Popper, 1972; Popper, 1983). Probability seems to be shifting between the objective and the subjective interpretation – it is ‘Janus-faced’ (Hacking, 1975). However, most scientists who use probability could not care On the common origins of psychology and statistics 29 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ less what interpretation is given to it or they even might see the philosophical debate as an irritating obstacle for practitioners of probability (Robert, 2001). It is especially since the Russian mathematician Kolmogorov published in 1933 in his famous book Grundbegriffe der Wahrscheinlichkeitsrechnung an arithmetical axiomatic model for probability theory, that most practical scientists lost every interest in the philosophical meaning of probability. The axiomatic model of Kolmogorov is a formal system – the axioms do not have a fundament in the ‘real world’ and do not answer the question how to interpret probability on a philosophical level (Hacking, 1976) – but on a formal or mathematical level the nearly universally accepted system has settled the majority of disputes: Following [Kolmogorov’s Grundbegriffe der Wahrscheinlichkeitsrechnung], a mathematician would answer the question of what is probability by saying: Anything that satisfies the axioms. Expressed in technical jargon, probability is a normalized denumerably additive measure defined over a σ-algebra of subsets of an abstract space. Something is lost in the answer, however. For if the space is finite, the answer shrinks down to saying: Probabilities are numbers between 0 and 1 such that if two events cannot occur simultaneously, the probability of either one of them occurring is the sum of the probability of the first and the probability of the second. The mathematician’s formalistic approach to the question does not address the meaning of probability […]. (von Plato, 1995, pp. 1-2) Figure 1. A. N. Kolmogorov So, although the formalistic system of Kolmogorov settles the majority of disputes on a mathematical level, it does not address the fundamental philosophical questions on the meaning of probability, and even less on the meaning of chance. On the common origins of psychology and statistics 30 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ II. Statistics: descriptive statistics and inferential statistics Until this moment we have spoken about probability and how to interpret it, not about statistics. In day-to-day language the word ‘statistics’ is a word used in such a way that two different but interrelated meanings of the word are mixed-up. The first sense in which the word ‘statistics’ is used, is for tabulated numerical data relating to aggregates of individuals, e.g., when the results of an examination are published the results are often neatly categorized in a table, so that one can see how well or how bad an obtained mark is in comparison with the results of the other students. I would be quite normal if a student in such a situation would say: “Although I did not get a first grade, but only an upper second, the statistics tell me that I still belong to the 15% of the highest scores”. This kind of statistics is only descriptive, for it does not draw any conclusions about, for instance, a whole population of students. The second sense in which the word ‘statistics’ is used is when one actually means ‘inferential statistics’. Inferential statistics is a form of ‘inferential’ or ‘inductive’ reasoning, grounded on probabilistic axioms. In inductive reasoning general conclusions are drawn – or ‘inferred’ – from a limited amount of observations, e.g. after observing ten black rabbits you conclude that all rabbits are black. It is not at all obvious that induction is a valid way of reasoning. Is it not quite illogic to draw from a particular observation (“the ten rabbits I observed were black”) a general conclusion (“all rabbits are black”)? This general conclusion was not implied in the particular observation and seems to appear out of nothing: like the proverbial rabbit out of the hat. This ‘problem of induction’ was for the first time explicitly described by Hume (1711-1776): Any degree, therefore, of regularity in our perception, can never be a foundation for us to infer a greater degree of regularity in some objects which were not perceived, since this supposes a contradiction, viz., a habit acquired by what was never present to the mind. (Hume, 2002/1739, book 1, part 4, section 2, p. 131) The problem of induction is still bothering philosophers; in particular analytic philosophers “still drive themselves up to the wall (to put it mildly) when they think about it seriously” (Hacking, 2001, p. 190). On the common origins of psychology and statistics 31 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ Figure 2. David Hume Statistical inference is a form of inductive reasoning that “presents itself as a mathematical solution to the problem of induction” (Cowles, 1989, p. 27). The inductive problem is in statistical inference replaced – or at least evaded (Hacking, 2001, p. 252 and 261 f.f.) – by the question if certain probabilistic axioms, e.g. normality and independence of the observations, that would validate the inference are rightly assumed. These probabilistic axioms assume that in the long run there is constancy in time: the number 5 on a fair die will have a probability of 1/6 now, as well as in 2012 and in 2096, when you threw it a large number of times, e.g. 10.000 times; and given the same circumstances or ‘chance set-up’ the average probability of dying when you are a 41-year old German will be in 2012 the same as it is now, when only a great number of 41-year old German is taken into account. Still, one could not say that an answer has been found to Hume’s problem of induction: only the problem itself has become more or less irrelevant, because of a change in what we think ‘reason’ is (this is one of the major themes of this thesis, so I will discuss this later more extensively): Anyone who tries to argue that the future will be like the past, on the ground that past futures have been like the pasts, is arguing in a circle. […] It is not, therefore, reason which is the guide of life, but custom. That alone determines the mind in all instances to suppose the future conformable to the past. However easy this step may seem, reason would never, to all eternity, be able to make it. […] We can do more with probability than Hume imagined. Probability theory was just beginning in Hume’s day. (Hacking, 2001, p. 251) So, statistical inference evades the problem of induction – but what is statistical inference? Statistical inference is a form of inductive reasoning that “may be defined as the use of the methods based on the rules of chance to draw conclusions from quantitative data” On the common origins of psychology and statistics 32 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ (Cowles, 1989, p.29). These rather abstract and dry formulations of what statistical inference means, conceal the revolutionary amplification of research successes that the emergence of statistical inference has brought about. Statistical inference as a scientific methodology only timidly emerged in the second half of the nineteenth century and it started efflorescing as late as the beginning of the twentieth century (see e.g. Gigerenzer et al., 1990). Before statistical inference emerged there was a major problem; although lots of data were gathered, the tools to draw inferences from these data were almost completely lacking. A bitter example of this lack of inferential methodology is the outburst of blood-letting in France between 1815 and 1835, which has led to many unnecessary lost lives. Never in history blood-letting was so widely employed as then. And although the blood-letting mania was embedded in a discussion wherein both the major advocate of bloodletting, doctor Broussais, as his adversaries threw at each other with impressive looking statistics, the abundant statistics would not lead to any substantive conclusions. “There were, then, statistics galore, but few conclusive statistical inferences. They were tools of rhetoric, not science. For all the enthusiasm for numbers, they did not have the immediate effect that one would have expected” (Hacking, 2004, p. 85). Figure 3. F.-J.-V. Broussais III. Inferential statistics built on probabilistic axioms Scientific statistical inference is able to infer from a limited amount of observations something that is in principle unobservable – based on probabilistic axioms. Because these probabilistic axioms are needed to infer conclusions from otherwise merely descriptive statistics, the distinction between the subjective and the objective interpretation of probability can be felt also in inferential statistics: there is inference based on subjective probabilistic On the common origins of psychology and statistics 33 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ axioms – called Bayesian statistical inference – and there is inference based on objective probabilistic axioms – called frequentist statistical inference. Although the formal probability rules of the axiomatic system of Kolmogorov hold for all types of probability, the system can be interpreted in a subjective way (then it is Bayes’ rule that forms the base of the subjective or Bayesian statistical inference) or in an objective way (then it is Bernoulli’s theorem – also known as the law of large numbers or the first limit theorem – that forms the base of the objective or frequentist statistical inference) (Maistrov, 1974, p. 264). In the following three subsections I will clarify some aspects of [a] subjective statistical inference (often called ‘Bayesian’), [b] objective (in particular: frequentist) statistical inference, and, [c] why Bayesian statistical inference in my opinion cannot be used in scientific, statistical methodology. (a) Bayesian (i.e., ‘subjective’) statistical inference: When a physician observes that a particular patient shows the symptoms fever, rash and red bumps he may infer from these symptoms a subjective degree of belief, i.e., a hypothesis with a certain probability, that these symptoms are caused by the measles virus. When the physician uses statistical data (e.g. the prevalence of measles among patients of a certain age and the frequency of occurrence that these symptoms indeed turned out to be caused by measles) to assign a probability to his measles-hypothesis, he is making a Bayesian statistical inference. Crucial in Bayesian statistical inference is that a hypothesis has a probability and that the probability of this hypothesis can be adjusted on the base of new empirical evidence: every time the physician is confronted with the symptoms fever, rash and red bumps that turn out to be caused by measles the probability of his hypothesis goes up. Bayesian statistical inference is a rarely used method among psychologists and other scientists, because there are not many situations wherein the a priori degree of belief (i.e., before it is adjusted by further experience) can be established in an unequivocal way. The ‘solutions’ proposed to avoid this ambivalence in the a priori degree of belief have been manifold. The statistician I.J. Good once counted, a bit tongue in cheek, up to 46,656 possible different Bayesian views (Gigerenzer, 2000, p. 16). I will present here just one them, to give an impression of the problems that are at stake in Bayesianism. The Italian statistician Bruno de Finetti (1906-1985) noticed that most people are not very good in estimating the probability of their beliefs (Aczel, 2004). On the common origins of psychology and statistics 34 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ Figure 4. Bruno de Finetti Suppose you have a friend, whose girlfriend went for a holiday to Ibiza with a girlfriend. When you ask your friend if he is sure that his girlfriend will not cheat on him he answers that he is one hundred percent sure that she will be true. But is your friend ‘really’ in touch with his inner feelings? Is the probability of his a priori belief in the faithfulness of his girlfriend ‘really’ one hundred percent? De Finetti proposed an ‘objective’ way to measure subjective probability. Offer your friend a hypothetical choice: “If it will turn out that your girlfriend has been indeed faithful to you, you will win one million dollars, but you can also choose to pick a ball out of bag with 90 red and 10 blue balls, and if you pick out a red ball then you will win the million”. If your friend chooses to draw from the bag instead of relying on his girlfriends faithfulness, you can conclude that he is not as confident as he claimed to be: apparently he is at most 90% confident. You can adjust the ratio of red to blue balls in the bag until your friend prefers to rely on his girlfriend to a pick from the bag, to find out the real probability of his confidence. Although a many Bayesianist will rely on less ‘psychological’ or ‘personalistic’ methods to establish the probability of a belief, this ‘de Finetti game’ clarifies why Bayesian inference is viewed by quite some people – including myself – as suspiciously ‘soft’ and ‘vague’. Bayesian statistical inference is mainly used in areas where such a ‘framework for thinking’ is a helpful heuristic, e.g. in the courtroom where it helps to combine evidence in a structural manner and in artificial intelligence applications. Although since the 1960’s there has been a growing amount of statisticians and philosophers of science who promote the use of Bayesian statistical inference as an appropriate methodology for scientific research (e.g. Gigerenzer et al., 2004; Romeijn, 2005; Winkler, 1974), most statistical textbooks do not even mention Bayesian techniques (Gigerenzer et al., 2004) and researchers using Bayesian techniques are unique phenomena: On the common origins of psychology and statistics 35 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ Hays (1963) had a chapter on Bayesian statistics in the second edition of his widely read textbook [Statistics for psychologists] but dropped it in subsequent editions. As he explained to one of us (GG), he dropped the chapter upon pressure from his publisher to produce a statistical cookbook that did not hint at the existence of alternative tools for statistical inference. Furthermore, he believed that many researchers are not interested in statistical thinking in the first place but solely in getting their papers published. (Gigerenzer et al., 2004, p. 395) (b) Frequentist (i.e., ‘objective’) statistical inference: When a scientists wants to draw inferences about a population on the basis of the observations which he obtained from a random sample, it is most likely that he will use frequentist statistical inference. Frequentist statistical inference relies on probability as it manifests itself in the long run. Statistical hypotheses are compared with data, obtained in carefully designed experiments. Contrary to the hypotheses in Bayesian inference these hypotheses can only be true or false – in frequentist inference it is impossible to assign a probability to a hypothesis (Hacking, 2001, p. 211); the probability relates only to the data, given a certain hypothesis. Suppose for instance that you want to test a hypothesis that a die is fair and that the probability of the turning up of the face with the number 5 on it is 1/6. This hypothesis cannot be ‘a little bit’ true. There are only two possibilities: reject the hypothesis or stick to it – one cannot be sure a hypothesis is ‘true’ when it has stand up to a test, because it may be falsified in another test. When a hypothesis stands up to many tests, it is a strong hypothesis: Popper calls such hypotheses corroborated hypotheses (Popper, 1972). When has a hypothesis be rejected? A hypothesis has to be rejected when the probability of the observed data given the hypothesis is very low. The scientist can choose how low is ‘very low’: often used levels are 5% or 1%. So when in 1800 throws the number 5 turned only 6 times face-up, these data seem rather improbable given the hypothesis that the die is fair – but it could be, although the chance of such a low occurrence of the number 5 when throwing a fair die is quite small: therefore we may conclude that either we observed something quite unusual, or the hypothesis that the die is biased is false. In this case the observed frequency has such a low probability, that a scientist will probably reject the hypothesis. To calculate the probability of the number 5 turning up only 6 times out of 1800, he uses frequency-type probability. On the common origins of psychology and statistics 36 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ (c) Why Bayesian (i.e., ‘subjective’) statistical inference in my opinion cannot be used in scientific, statistical methodology: Objective, frequentist inferential statistics assigns probabilities to data; whereas subjective, Bayesian inferential statistics assigns probabilities to hypotheses. The reason why probabilities may be assigned to data is quite clear. Data – when gathered in an accurate scientific manner – may be assumed to be random. This assumption of randomness allows a scientist to apply probabilistic axioms to them and assess their probability given a certain hypothesis. However, hypotheses are not random events. Therefore the assignment of probabilities to hypotheses cannot be grounded in the assumption of randomness. This raises the question on which ground subjective, Bayesian statisticians assign a probability to a hypothesis. Apparently it is a fallacy to think – as some subjective probabilists do – that probability grows with experience. It is untenable to hold that the more white swans one encounters, the more probable the hypothesis ‘All swans are white’ becomes; after all, one just needs to encounter one black swan to falsify this hypothesis completely (Popper, 1972). Also the idea that scientists should “discover their degrees of beliefs by introspection, perhaps by considering the odds they might give if presented with […] a series of bets” (Mayo, 1996, p. 75), sounds suspiciously subjectivistic and unsolid. Moreover, why would one assume at all that beliefs are expressible as probabilities? Although contemporary Bayesianists have performed endless learned tours de forces to make their subjectivism look unimpeachably objective (Mayo, 1996, p. 85), Bayesianist statistics stays in nuce always a subjective, inductive method – struggling with the question why a certain probability should be assigned to a hypothesis – and therefore, at least in my opinion, is utterly unfit as a method in an experimental science which wants to put hypotheses to a test. So, to summarize this section, up to this point we have seen that inferential statistical methodology is built on probability, which may be interpreted in either a subjective (‘Bayesian’) or an objective way. In my personal opinion Bayesian statistical inference is unfit to be used as a scientific method. However, whereas probability calculus itself stands on a stable arithmetical axiomatic model (mainly the axioms formulated by Kolmogorov), a strong axiomatic model for statistical inference – both objective and subjective! – is lacking. Once I had a talk with a student of mathematics. When he heard that I studied psychology he asked me with a condescending smile: “So you work with statistics? In my first year in college I had to attend a course that On the common origins of psychology and statistics 37 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ consisted for one half of classes on the calculus of probabilities and for another half of classes on statistics: and as to every mathematician it became already then immediately clear to me that the former was a venerable calculus, firm as a rock, while the latter – although build upon the respectable calculus of probabilities – was a ramshackle discipline, juggling in a highly suspicious way with assumptions.” Of course, this student put things in very oversimplified way: but his remark conveys a nucleus of truth to which the majority of exact scientists would subscribe. After all, in the calculus of probability there has not changed a lot since Laplace (1749-1827) and Gauss (1777-1855) whereas statistical inference as a scientific methodology at that time was still in an embryonic stage. On the common origins of psychology and statistics 38 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ [2.D] The remarkable lack of probabilistic thinking until the second half of the seventeenth century After the terminological overview of the words ‘probability’, ‘descriptive statistics’ and ‘inferential statistics’ in the previous section, the remaining two sections of chapter two (§2.D and §2.E) will be devoted to some historical aspects of these notions. In this section (§2.D) the remarkable lack of probabilistic thinking until the second half of the seventeenth century will be studied. The first subsection of this section explains that – although it is different to imagine! – the emergence of probability really involved a deep conceptual change and that the probability such as it emerged in the seventeenth century is incommensurably different from its ‘precursors’. The second subsection elucidates the circumstances that triggered the conceptual change that made the emergence of probability possible. I. Why is the lack of probability until the second half of the seventeenth century amazing us so much? Mathematical probability was almost completely lacking until the second half of the seventeenth century. Even the word ‘probability’ itself, in the probabilistic sense as we know it today (‘likely’, ‘apparently true’) was also non-existent before the seventeenth century (Room, 1986). This fact, that probability is of quite recent origin and has an abrupt emergence in the seventeenth century, baffles scientists (Daston, 1988; Hacking, 1975). However, why is it so startling that probability is only such a recent ‘discovery’? Nobody seems for instance to be very amazed by the fact that psychology as a science did not exist before the nineteenth century. Or that quantum mechanics was not discovered earlier. Because probability is in our present time so omnipresent, it is hardly possible to think of something which is not subject to probability. Already in 1942 the prominent British statistician Maurice Kendall said that statisticians “have already overrun every branch of science with a rapidity of conquest rivalled only by Atilla, Mohammed, and the Colorado beetle” (Kendall, 1942, p. 69). It is therefore very luring to think that probability has always existed: On the common origins of psychology and statistics 39 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ The probability expert I.J. Good claims that probability predates the human race. He argues that animals have a sense of probability – a predator might instinctively assess the probabilities that the prey will choose among various escape routes, and chase down the route that is most probable. (Aczel, 2004, p. 3) It sounds probable, not? And what to think about the cognitive research, that thinks of the human mind as an ‘intuitive statistician’1 (Brunswik, 1943; Gigerenzer, 2000; Gigerenzer & Murray, 1987)? It also sounds very probable, i.e. reasonable. Of course, when you search for it – in particular with the benefit of the hindsight – you can even find numerous foretokens and precursors of probability in the ancient world and in prehistory. But there is here a great danger of anachronism. The fact that the ancient Babylonians, Egyptians, Greeks, and pre-Christian Romans used the heel- or knucklebones, ‘astralagi’, as dice (David, 1962) makes their lack of a concept of probability even more remarkable from a modern point of view, but does not justify the conclusion that they had already a rudimentary probabilistic concept. Figure 5. 'Astralagi': heel- or knucklebones that were used as dice in Ancient times. There was the concept of ‘chance’ or ‘Fortuna’ in the Ancient world and the Middle Ages – gambling and divination where widespread (Cioffari, 1973; Kendall, 1973) – but this concept of chance was not linked to rationality, at least certainly not in the way it was in the word ‘probability’ as it emerged in the second half of the seventeenth century. The emergence of probability in the second half of the seventeenth century is therefore the emergence of the word ‘probability’ as a certain way of thinking together of chance and rationality. I assume that this fact – that the changed meaning of the word ‘probability’ involves our rationality and therefore our thinking – makes it so difficult to remember that probability has a quite clear historical origin. As I mentioned earlier, the frequentist Von Mises compared ‘probability’ with the word ‘length’. I think that his comparison touches upon a very empirical fact. Once you have seen the world ‘through’ the word ‘length’, it has become impossible to see the world without it – e.g., you perceive John as longer than Frank. Although in retrospect the ‘concept’ of 1 See also chapter 3 (p. 66) and § 3.F “The lure of ‘subjective’ semantics in cognitive psychology” (p. 109 f.f.). On the common origins of psychology and statistics 40 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ ‘length’ seems to have existed always, it seems to be at the same time undeniable that there must have been somebody who uttered the word ‘length’ for the first time. The same holds for the word ‘probability’. Moreover, the word ‘probability’ keeps probably even more after us then the word ‘length’, because it affects how we think – when thinking we think of our thoughts as having certain probabilities and thinking of a time when thoughts were not linked with probabilities seems impossible. There are adversaries of the idea that in the seventeenth century a conceptual change happened – such as Daniel Garber and Sandy Zabell – but it is important to bear in mind that their aversion may be partly dictated by the impossibility they meet with when they try to ‘imagine’ a world without probabilities: […], it is difficult to imagine a period of modern history in which concepts of probability, evidence and chance did not exist, when the epistemic and the aleatory were not intermixed. There should be an underlying suspicion that, like the legendary pre-logical peoples of the pre-Quinean anthropologist, these recent pre-probabilistic times may be a figment of the investor’s imagination. (Garber & Zabell, 1979, p. 37) It cannot be denied that writers as the above mentioned Garber and Zadell, or David in her book Games, gods and gambling (1962) – that has become in the meanwhile a classic in the history of statistics – and may others (e.g. Sambursky, 1956; Sheynin, 1974), have done conscientious research wherein is convincingly shown that there can be found many forebodes of the ‘concept’ of probability, but nevertheless all the instances of this ‘prehistory of the theory of probability’ are rather anecdotic: The origins [of probability] have been sought in astronomy, fine arts, gambling, medicine, alchemy, and the insurance trade. The quest for antecedents has been a frustrating one, uncovering proto-probabilistic thinking everywhere and nowhere. Certain passages of Aristotle, for example, could be construed as an embryonic version of statistical correlation or a scale of subjective probabilities; with an even greater effort of the imagination, Bayes’ theorem may be discovered in medieval Talmudic exegesis. However, these philosophical discourses on the nature of chance and rules of thumb for dealing with situations fraught with uncertainty […] not only fall short of a mathematical treatment of probability considered in and of themselves, but they also manifestly failed to generate such a theory. (Daston, 1988, p. 8) On the common origins of psychology and statistics 41 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ II. How the meaning of the word ‘probability’ changed in the seventeenth century There is a general consensus that the official moment of the birth of mathematical probability has to be assigned to the period from July to October 1654, when Pascal and Fermat sent to each other five letters concerning the so called ‘problem of points’: a probabilistic problem that was posed to Pascal by the gambler Chevalier de Méré and that questions how the stakes in a dice game should be divided when it is prematurely cut off, i.e., if there is a fair division of the stakes based on the probability each of the two players has of winning the total game given the results of the previous rounds. The letters have been translated to English (David, 1962) and practically every book on probability mentions them as the origin of mathematical probability (e.g. Daston, 1988; Gillies, 2003; Hacking, 1975; Maistrov, 1974; Oosterhuis, 1991; Schuh, 1964; Vlis & Heemstra, 1988). Figure 6. Fermat Figure 7. Pascal Simultaneous or immediately following the Pascal-Fermat correspondence there is a real explosion in probabilistic ideas. Christiaan Huygens visited Paris in 1655 (Stigler, 1999, p. 239), heard from the Pascal-Fermat correspondence and wrote in 1656 the booklet Rekeningh in Spelen van Geluck, that was translated into Latin in 1657 as De ratiociniis in ludo aleae. Figure 8. Christaan Huygens On the common origins of psychology and statistics 42 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ This text of Christiaan Huygens was the point of departure for the famous Ars conjectandi (The art of conjecturing) of Jacob Bernoulli (1654-1705), that was probably written in 1692 (Hacking, 1975) and posthumously published in 1713 by his nephew Nicolaus Bernoulli. Figure 9. Jacob Bernoulli, Ars conjectandi (1713). In Ars conjectandi Bernoulli formulated the ‘law of large numbers’ – otherwise known as ‘Bernoulli’s theorem’ or the ‘first limit theorem’. This monumental discovery is the fundament of all modern developments in probability calculus and statistical inference (Bockstaele, Cerulus, & Vanpaemel, 2004). Without the ‘law of large numbers’ the frequentist statistical inference – on which all psychological research hinges – would not exist. Jacques Bernoulli’s Ars conjectandi presents the most decisive conceptual innovations in the early history of probability. […] probability came before the public with a brilliant portent of all the things we know about it know: its mathematical profundity, its unbounded practical applications, its squirming duality and its constant invitation for philosophizing. Probability had fully emerged. (Hacking, 1975, p. 143) Figure 10. J. Bernoulli Figure 11. N. Bernoulli So it is evident that between 1654 and 1713 there was a spurt of probability. However, why did it happen only then? On the common origins of psychology and statistics 43 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ Several, sometimes quite far-fetched, suggestions have been made. Some explanations are sociological or psychological (Garber & Zabell, 1979), some search for a “more fundamental factor” (Kendall, 1956, p.10) and put emphasis on the conceptual change. Probably one of the most cited explanations is that of David (1955) who assumes that probability calculus could not develop earlier because dices were often used for magical or religious purposes – calculation of probabilities would be a blasphemous interference with the deity expressing his wishes – and because in ancient times people did not use uniform sixfaced dices but uneven knuckle-bones (see figure 5) to gamble: whereas with modern dices it is clear that the probability of each side is equiprobable, with the unequal sized bones this was not the case, which could have made probability calculations less obvious (David, 1955). Kendall (1956) thinks that it was the Reformation that gave rise to the development of a concept of probability; Garber and Zabell point to the general rise of scientific activity in the seventeenth century (1979); Maistrov searches the reason for the raise of probability in economic circumstances – but when one takes in consideration that his book was written in communist Soviet Union it seems wise to take his references to Marxist-economic theory with a pinch of salt (Maistrov, 1974, p. 5); Daston (1988, p. 14) argues in a quite convincing way that probability emerged from the context of the law and is connected with seventeenth century legal reforms “concerning evidence both in and out of the courtroom”. Daston (1988) furthermore does a wonderful job in clarifying why certain theories that try to explain the emergence of probability are likely to be false: there was no lack of mathematical knowledge that would have prohibited an earlier emergence (cf. Hacking, 1975); it were not the games of chance that provided the conceptual framework or catalyst (cf. Maistrov, 1974); it cannot be said that the raise of mathematical probability is identical with the raise of a uniform concept of probability – the seventeenth century theoretical concept of probability is nowhere near a uniform concept; and probability did not arise because there was a need for it in business (cf. Hacking, 1975) – although seventeenth and eighteenth century probabilists such as Johan De Witt (1625-1672) and Edmund Halley (1656-1742) used very ‘practical’ examples concerning insurances and annuities (both booming business in the seventeenth and eighteenth century), the commercial implications of their probabilistic ideas were distrusted or at least not understood and their contribution to practice was consequently nil until the end of the eighteenth century: …the vogue for insurance seems to have been less prudential than reckless, fuelled more by the spirit of gambling than foresight. (Daston, 1988, p. 165) On the common origins of psychology and statistics 44 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ Figure 12. E. Halley Figure 13. Johan de Witt So it turns out that the question why probability arose in the second half of the seventeenth century, is not very easy to answer. There is however one answer to this question – formulated by Ian Hacking – that is much more sophisticated and stimulating (Daston, 1988, p. 11) than all the other explanations. Hacking (1975) argues that the word ‘probable’ before the seventeenth century was a predicate that could be ascribed to an opinion when it was approved by intelligent people; ‘probability’ meant therefore mainly the ‘approvability of an opinion’ (Hacking, 1975, p. 23) and one could speak of a ‘probable opinion’ when that opinion was supported by the authority of authorative persons or ancient books, e.g., “My opinion is probable because Plato, Aristotle and Paracelsus subscribe to it”. Until the seventeenth century there was a strict separation between scientia (knowledge), i.e., the demonstrative knowledge of the ‘higher’ sciences such as mechanics and astronomy, and opinio (opinion), i.e. the beliefs and doctrines of the ‘lower’ sciences, such as alchemy and medicine, which were not begot by demonstration. This distinction is unfamiliar to the modern mind and therefore needs some further clarification. By dropping balls of different masses from the Leaning Tower of Pisa Galileo Galilei (1564-1642) demonstrated that their acceleration was independent of their mass, because the heavier balls reached the ground just as fast as the lighter balls – therefore this knowledge was worthy of the name scientia. When talking about scientia it would have been nonsensical to have spoken of something like ‘evidence’, because scientia involved an absolute demonstration without any room for doubts, interpretations, probabilities or evidence. After all, if you are able to demonstrate a fact, further support by authorities (in order to make it ‘probable’) or by supplementary evidence is superfluous. On the common origins of psychology and statistics 45 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ It is obvious that this kind of demonstrative knowledge was unattainable for a astrologer or a physician: the position of the stars and the symptoms of a disease are only signs of an underlying reality that never can be demonstrated in an absolute way. The knowledge of an alchemist, astrologer, geologist or physician consequently was not scientia, but was mere opinio. Due to the strict distinction between opinio and scientia, it would have been absurd to assign the attribute ‘probability’ to scientia. However, in retrospective one could say that when Renaissance thinkers started on a large scale to look at nature as “the Book of Nature” the seeds for the dissolving of the strict separation between scientia and opinio were sewn: Nature is the written word, the writ of the Author of Nature. Signs have probability because they come from this ultimate authority. (Hacking, 1975, p.30) This ‘metaphor’ of Nature as a book whose signs have to be read in the right way did exist also before the Renaissance (Garber & Zabell, 1979), but the Renaissance thought was really completely imbued by this ‘metaphor’ (cf. Foucault, 1966). Although one can only guess at the reasons that made the idea of the “Book of Nature” so immensely popular, it seems plausible that the Reformation and the return to the text of the Scripture were helpful (Popkin, 1964). However, whatever the reason may have been for the popularization of the idea that nature is a book, it is a fact that it entailed a loosing up of the strict distinction between scientia and opinio and consequently brought along also a slightly changed meaning of the word ‘probability’: A new kind of testimony was accepted: the testimony of nature which, like any authority, was to be read. Nature now could confer evidence, not, it seemed, in some new way but in the old way of reading and authority. A proposition was now probable, as we should say, if there was evidence for it, but in those days it was probable because it was testified to by the best authority. (Hacking, 1975, p. 44) The mutation from ‘probable opinion’ as nondemonstrative knowledge supported by authorities to ‘probable opinion’ as the only possible knowledge supported by the authorative signs of the book of nature, led at the turn of the seventeenth century to a ‘janus-faced’ concept of probability (Hacking, 1975), namely a concept that is grounded as well in the frequencies of nature (aleatory probability) as in the insufficient knowledge of man (epistemic probability). On the common origins of psychology and statistics 46 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ So, for instance suppose you are a fourteenth century monk reading Virgil’s pastoral poems, the so-called Eclogues, that he wrote around 37 BC, i.e. before the birth of Christ. You have the ‘opinio’ that Virgil had a presentiment of the birth of Christ and that his fourth Eclogue has to be interpreted as a prophecy of the coming of Jesus Christ. You explain to your fellow monks that your ‘opinio’ is very probable, because it is supported by some very authorative thinkers, such as the abbot of your monastery and a very erudite cardinal. The word ‘probable’ in this sense is not ‘janus-faced’: it is clearly ‘epistemic’ (although it is a bit anachronistic to use this notion in the context of fourteenth century thought), because it is only used to express the extent to which some non-demonstrative knowledge is supported by authorative thinkers. However, as soon as the domain of ‘opinio’ no longer concerns your interpretation of Virgil but your ‘opinio’ on Nature (or: the Book of Nature) and the authority whereon you rely to show that your ‘opinio’ is ‘probable’ is no longer an authorative abbot or cardinal but Nature itself, probability becomes ‘janus-faced’: it is both ‘epistemic’ or ‘subjective’ (for it still deals with ‘opinions’ and ‘supportive opinions’ who rely on the authority of Nature), as well as ‘objective’ because it is the ‘authority’ of Nature itself who gives us evidence through the ‘frequencies’ of its ‘signs’ if an ‘opinio’ on Nature is ‘probable’. On the common origins of psychology and statistics 47 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ [2.E] From seventeenth century ‘probability’ to eighteenth century ‘Statistik’ and nineteenth and twentieth century ‘statistical inference’. In this section is explored how probability evolved after its emergence. This section is divided in the following subsections: I. Why the classical concept of probability seems equivocal from a modern point of view. II. ‘Statistik’ – the counting of figures as an expression of power of a state. III. The combination of probabilistic thinking and statistics into statistical inference. Frequentist and Bayesian inference. I. Why the classical concept of probability seems equivocal from a modern point of view The seventeenth and eighteenth century concept of probability seems from a modern point of view equivocal (Gigerenzer et al., 1990). As I have shown earlier in this chapter, in section 2.C, the modern interpretations of probability are divided in objective and subjective interpretations. This clear duality is completely alien to classical probability. A seventeenth or eighteenth century probabilist would not have called his concept ‘janus-faced’: it is the modern mind who has to label it like that in retrospective. A classical probabilist did not notice any ambiguity in his concept of ‘probability’: however, to a modern probabilist this ambiguity in classical probability is so obvious, that it is hard to understand how this ambiguity could not have struck the attention of the classical probabilists. When was the transition from classical to modern probability? Although the break was “neither sudden nor clear” (Daston, 1988, p. 371), it is in the 1830s and 1840s in the work of Poisson (1781-1840) and Cournot (1801-1877) – in particular in his book ‘Exposition de la théorie des chances et des probabilités (1843) – that the distinction between subjective probability (‘probabilité’) and objective probability (‘chance’) is for the first time explicitly problematized (Daston, 1988; Hacking, 2004). Of course there are a lot of modern probabilistic thinkers who try to merge the objective and the subjective interpretation in some way or another, but they are always aware of the fact that they are trying to bridge the subjective and the objective interpretation. It seems impossible to the modern probabilist to erase the duality of probability. On the common origins of psychology and statistics 48 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ Figure 14. Poisson Figure 15. Cournot The momentous book Logical Foundations of Probability (1951) of Carnap is one of the most well known attempts of the twentieth century of coping with this distinction. Carnap (1951) distinguishes ‘probability1’ (inductive probability) and ‘probability2’ (statistical probability), i.e. ‘probability1’ is subjective and ‘probability2’ is objective. But, Carnap, and Cournot before him, notoriously failed to bring tranquillity out of controversy by their judicious mixture of conceptual analysis and linguistic distinction. Philosophers seem singularly unable to put asunder the aleatory and the epistemological side of probability. (Hacking, 1975, p. 15) The fact that classical probility has a janus-faced character, which only from a modern point of view seems ambigious, has led to a lot of misinterpretations of classical probabilist thought. The misunderstandings surrounding the thought of Jacob Bernoulli and Thomas Bayes are good examples of the difficulties a modern probabilist encounters when he tries to grasp theories from the era of classical probability. As I mentioned earlier the ‘law of large numbers’ of Jacob Bernoulli (1654-1705) has laid the fundament for the frequentist interpretation of probability; but at the same time Jacob Bernoulli has been called also a subjectivist because he is the first to use the word ‘subjective’ in relation to probability theory (Hacking, 1975) and because he talks of probability as a “degree of certainty” (Hald, 1998); yet, “we do not know how Bernoulli would have used his theory in practice because he never analyzed a set of real data” (Hald, 1998, p. 157). And although the term ‘Bayesian statistics’ is used as the stock phrase for statistics based on the subjective interpretation of probability, it is highly doubtful if Reverend Thomas Bayes (1702(?)-1761) would have recognized his own thoughts in what is now ranged under the denominator ‘Bayesian statistics’. It was the ‘subjective’ interpretation of Bayes by Laplace (1749-1827) – notwithstanding that Laplace himself is still a classical thinker and On the common origins of psychology and statistics 49 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ therefore does not make a clear distinction either between subjective and objective probabilities – which formed the probabilistic ideas that are known today as Bayesian. It is this incommensurability between classical and modern probabilistic thought that makes classical probabilistic ideas look very ambiguous to the modern mind and that probably could help explain why ‘Stigler’s Law of Eponymy’, i.e. the law that “no scientific discovery is named after its original discoverer” (Stigler, 1999, p. 277), applies so extraordinarily well to probabilistic ideas. II. ‘Statistik’ – the counting of figures as an expression of power of a state After the emergence of probability in the second half of seventeenth century it took another two centuries before the probabilistic calculus obtained his star role in statistical inference. Nonetheless, this historical observation does not imply that before the end of the nineteenth century there was no practice of gathering ‘statistical’ data – on the contrary, although “often incomplete and unreliable” (Daston, 1988, p. 127) demographic data existed from the early sixteenth century and from the second half of the seventeenth century it became really booming business! – nor does it imply that probabilistic calculus and the gathering of ‘statistical’ data were unrelated areas in the seventeenth, eighteenth and beginning of the nineteenth century. Why would one take great pains to gather data, if one does not want to draw conclusions from them? The answer to this question is hinted already by the etymology of the word ‘statistics’: it was an expression of the power of a state – e.g., the more fertile women, the more revenue income or the more man capable of serving in the army, the more strength would be ascribed to a state (Desrosières, 1993; Hacking, 2004; Nikolow, 2001; Westergaard, 1969). The word ‘Statistik’ was first used by the German ‘statist’ Gottfried Achenwall in his book Staatsverfassung der heutigen vornehmsten europäischen Reiche und Völker im Grundrisse (1749). However, Achenwall’s ‘Statistik’ was not an isolated idea. Every selfrespecting nation at that time was seized by the ‘statistical’ frenzy, although the name, the degree of quantification and institutionalization, exact techniques of counting and emphasis varied from country to country depending partly of the form of government (Desrosières, 1993): e.g., the English ‘political arithmetic’ – mostly a quantitative hobby of well-to-do dilettantes – is renown for its bills of mortality, whereas the German ‘Statistik’, ‘Staatswissenschaft’ or ‘Kameralwissenschaft’ such as practiced in the multitude of the German-speaking states became reasonably fast institutionalized and focused more on comparisons between cultural-geographical particularities and the idea that a nation-state is On the common origins of psychology and statistics 50 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ characterized by its descriptive statistics (Desrosières, 1999; Porter, 2003c). Still, all the ‘statisticians’ or ‘political arithmeticians’ such as John Graunt (1620-1674), William Petty (1623-1687), Hermann Conring (1606-1681), Daniel Bernoulli (1700-1782), Johann Peter Süßmilch (1707-1767), Gottfried Achenwall (1719–1772) Anton Friedrich Büsching (1724– 1793), August Friedrich Wilhelm Crome (1753–1833) and Sir John Sinclair (1754-1835) share the idea that certain figures express the wealth, strength or power of a state. So, are the data that are gathered in seventeenth and eighteenth century solely an expression of the power of a state? They were mainly, although one can see in the beginning of the nineteenth century that the ‘rhetorical’ function is not only applied in relation to the strength of the state, but also for example in the already mentioned (pp. 31-32) blood-letting debate in France between doctor Broussais and his adversaries (Hacking, 2004). However, did not the practice of gathering data cross the calculus of probabilities before the 1830s? On the face of it, it looks as if the calculus of probabilities was from the moment of its emergence in the seventeenth century immediately connected with statistical data. Directly after the publication of Graunt’s mortality table in 1662, probabilistic thinkers all over Europe – including Christiaan and Ludwig Huygens, Johann de Witt, Halley, Leibniz, Jacob and Nicolaus Bernouilli – were eager to apply the young probability calculus to the data of Graunt’s tables. Nevertheless, this ‘application’ of probability calculus during the period of ‘classical probability’ is incomparably different from the modern practice of statistical inference as we know it since the beginnings of the twentieth century. Although the seventeenth and eighteenth century probabilistic thinkers themselves would have described their approach to Graunt’s data as empirical, their treatment of the data seems to a modern observer unacceptable: the data were moulded and trimmed until they formed supportive evidence for a probabilistic regularity (Daston, 1988; Gigerenzer et al., 1990). The probabilistic thinkers were often ‘natural theologians’ (Gigerenzer et al., 1990), who believed that their calculus revealed a divine design in nature: and if the data did not fit the assumed probabilistic regularities, the data had to be adjusted until they did. A significant example of this believe in the omnipresence of the regularities of the “divine handiwork” (Gigerenzer et al., 1990, p. 13) is the influential book „Die Göttliche Ordnung in den Verhältnissen des menschlichen Geschlechts, aus der Geburt, dem Tode und der Fortpflanzung desselben erwiesen“ (1741) of Johann Peter Süßmilch (Pearson, 1978). The general tendency was to neglect deviations in data and therefore the complaints made by the Dutch probabilist Nicholas Struyck in 1740 that “mortality doesn’t listen to our suppositions” and that many of the tables allegedly based On the common origins of psychology and statistics 51 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ on observation were in fact “pure hypotheses” (Daston, 1988, p.130) are very unique in the eighteenth century. Figure 16. Arbuthnot Although it is possible to look at the ‘discovery’ of John Arbuthnot (1667-1735) that year after year there is a slightly higher probability for male births than for female births (communicated in 1710 to the Royal Society under the title ‘An Argument for Divine Providence taken from the Constant Regularity of the Births of Both Sexes’) as a very primitive form of an inferential statistical test (Gigerenzer et al., 1990; Hacking, 1976), his interest in the ratio of male-to-female births was far from purely methodological and clearly guided by his belief in a divine order that seemed to be expressed in the male/female ratio: Arbuthnot noted that male births consistently exceeded female births in the ratio 18 to 17. He argued that if this regularity were due to what he called “mere chance” – that is, assuming that the probability of either male or female equals ½ – the probability of the observed ratio over a long period was astronomically small. Arbuthnot concluded that this was palpable evidence of design, namely, the divine provision for an equal number of men and women of marriageable age to ensure the propagation of the race via monogamy. Due to the greater “wastage” of young men, who led more hazardous lives, it was prudent to begin with a small surplus. (Daston, 1987, p. 302) Though the approach of Arbuthnot might seem to be extraordinarily modern (Pearson, 1978) and is likely to remind the contemporary statistician of modern hypothesis testing (H0: the male/female ratio is due to mere chance; H1: the male/female ratio is an expression of Gods design), it would take another two centuries before statistical testing became a clearly defined, autonomous method: Probabilistic tests, however, never became routine operations in any discipline until the beginning of the twentieth century, and hence there was no sustained effort to develop and improve a methodology of significance testing. (Gigerenzer et al., 1990, p. 79) On the common origins of psychology and statistics 52 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ III. The combination of probabilistic thinking and statistics into statistical inference. Frequentist and Bayesian inference. When did the practice of gathering aggregates of descriptive statistical data cross the path of calculus of probability in such a way that statistical inference became possible? When was the moment that one could make in a fruitful manner inferences concerning some unknown parameter (e.g., the mean) of a population from a limited amount of observations? When did it became possible to infer the unknown composition of an urn filled with red and blue balls from a random sample drawn from it? When did chance became so ‘domesticated’ that inexorcisable variability – such as, e.g., the variable reactions of patients to a drug or crops to a fertilizer – stopped being a barrier to making reliable inferences? When were the techniques to test hypotheses developed? To these questions – all variations on the same theme, viz., ‘when did statistical inference emerge?’ – two kinds of answers are possible: (a) a simple ‘historical-anecdotic’ answer and (b) a more complex ‘historical-philosophical’ answer. (a.) The simple historical-anecdotic answer to the question: ‘When did statistical inference emerge?’. The ‘simple’ historical-anecdotic answer, i.e. the answer you are most likely going to get, is that the scientific methodology of statistical inference emerged only in the beginning of the twentieth century. The statistical inference that emerged at the beginnings of the twentieth century was frequentist statistical inference – the scientific methodology that has become the methodological monopolist in psychology and in almost every other modern science. Thus when searching for the historical origins of the statistical methodology used in modern science, the natural manoeuvre (see e.g. the popular-scientific book of Salsburg, 2001) is consequently to turn to the great names of the beginning of the twentieth century – Ronald Fisher (1890-1962), Egon Pearson (1895-1980), Jerzy Neyman (1894-1981) and William Gosset (1876-1937). Yet, it has to be mentioned that during the ‘flower power’ sixties of the twentieth century it was attempted to dethrone frequentist statistical inference and replace it by Bayesian inferential statistics. Bayesian inferential statistics2 was introduced by people like Savage (1917-1971) and de Finetti (1906-1985). This Bayesian vogue left some traces in artificial intelligence, game theory and the weighting of evidence in the courtroom, but within 2 See also above § 2.C “Making sense in the pile of words: probability, statistics and statistical inference” (p. 34 f.f.). On the common origins of psychology and statistics 53 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ scientific methodology it left virtually no traces. Bayesian statistics smells too much of ‘soft subjectivity’ and ‘dubious a priori probabilities’, I guess. I does not seem to be likely that the Bayesian ‘heresy’ will ever dethrone frequentist statistics, although it seems sometimes that if you are a social science methodologist or a probabilist philosopher that it is again very ‘fashionable’ to sympathize with Bayesian ideas or at least with some combination between Bayesian and frequentist statistics (Romeijn, 2005, p. 12). In this sense Bayesian statistics is now booming. So, to summarize, the historical-anecdotic way of looking at the emergence of statistical inference as a scientific method presents the history of statistical inference as follows: First – approximately between 1920 and 1940 – a frequentist inferential methodology was developed, whose status as an indispensable instrument to psychology would be established in the period 1940-1955 (Gigerenzer & Murray, 1987), and later – in the 1960s and 1970s – the Bayesian approach gained some ground, albeit hardly as a scientific methodology. The antagonisms between the principal characters who developed frequentist methodology in the period 1920-1940 are explained in a rather psychological manner. After this schematic depiction it is time to add some couleur locale to it – for otherwise this account would only be historical, instead of historical-anecdotic. The relations between the principle statisticians at the dawn of the statistical inferential era – Francis Galton (1822-1911), Karl Pearson (1857-1936), Ronald Fisher (1890-1962), Egon Pearson (18951980), Jerzy Neyman (1894-1981),William Gosset (1876-1937), etc. – are a source of juicy anecdotes and it would be a shame not to mention them. The early history of frequentist statistical inference is located in Britain – in particular in and around London and Cambridge – and is embedded in Darwinism and eugenics (MacKenzie, 1981): it is there that Francis Galton (1822-1911), the nephew of Charles Darwin (1809-1882), studied human variability and heredity and invented ‘regression’ and the use of the regression-line3; it is there that his statistical heir Karl Pearson (1857-1936) defines mathematically the statistical concept of the ‘correlation coefficient’4 in 1896 (Porter, 1986), becomes the first holder of the ‘Galton Professorship of Eugenics’ at University College London, leads the biometric laboratory and the eugenics laboratory and that had been established according to Galton’s will after his dead in 1911 (Porter, 1986) and writes a fourvolume biography about the life and labours of his teacher (Pearson, 1914, 1924, 1930). 3 See also §3.D “Statistical methodology as a re-enactement of evolution: tamed variation and accelerated selction” (p. 91 f.f.). 4 See also §3.D “Statistical methodology as a re-enactement of evolution: tamed variation and accelerated selction” (p. 97 f.f.). On the common origins of psychology and statistics 54 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ Figure 17. Karl Pearson (left) and Galton (right) Galton and Karl Pearson are thus the founding fathers of the basic language of statistics and they laid the foundations for the institutionalization of biometrics and statistics (e.g. Desrosières, 1993); it is on those fundaments that the generation after Karl Pearson will produce in the first decades of the twentieth century a new statistical inferential methodology that will lead in the 1940s to the massive adoption of these methods in psychology, i.e., the so-called the ‘statistical inferential revolution’(see e.g. Gigerenzer & Murray, 1987; Gigerenzer et al., 1990). Between Karl Pearson and the founding fathers of frequentist statistical inference lies a deep generation gap. When one would have walked into Galton’s biometrical laboratory in London at the turn of the century, one would have seen legions of young women, so-called ‘calculators’, busy with laborious arithmetical operations that Karl Pearson had ordered them to make so that from enormous amounts of biological data the distribution-parameters (e.g., the mean, the standard deviation and symmetry) could be extracted (Salsburg, 2001). Everything that could be measured was measured, e.g., lengths of human arms and legs, beak lengths of exotic tropical birds, leaves of mulberry trees and the cranial capacity of human skulls from ancient cemeteries (Salsburg, 2001). The inductive or inferential problem – ‘Why would one assume that one can draw conclusions about a whole population from a limited amount of observations?’ – was not very salient yet in this approach, because the collectors of data tried to gather as much data as they could and the observations were not so ‘limited’ at all: “as copious observational and experimental data as possible” (Pearson, Weldon, & Davenport, 1901, p. 5). Another factor that entailed that the problem of induction was not very pressing, was the fact that there was On the common origins of psychology and statistics 55 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ no direct ‘practical’ purpose for the calculated distributions besides the rather abstract hope that one would be able to signal changes in the distributional parameters (e.g., the mean size of an elephant) that could indicate evolutionary shifts which one could not see with the naked eye: The primary object of Biometry is to afford material that shall be exact enough for the discovery of incipient changes in evolution which are too small to be otherwise apparent. The distribution of any given attribute, within any given species, at any given time, has to be determined, together with its relations to external influences. (Galton, 1901, p. 9) The concerns of the next generation – the founding fathers of the frequentist statistical inference – were much more concrete, involving inferences from small samples, experimental designs, randomization, significance and hypothesis testing. The statistical inferences had to provide answers to practical questions like ‘Is the concentration of yeast cells in a sample of Guinness beer representative for the amount of yeast cells in the whole jar?’ (W.S.Gosset) and ‘Are the variations in crop yield at the Rothamsted Agricultural Experimental Station the result of the use of a certain fertilizer or of other incontrollable factors?’ (R.A. Fisher). Figure 18. Example of Fisher's experimental design: 5 x 5 Latin square of different trees laid out at Bettgelert Forest in 1929. (Fisher Box, 1978) Gosset invented the t-statistic that made it possible to draw inferences from small samples. Fisher integrated the use of significance testing with experimental design (Gigerenzer et al., 1990, p. 79): his in 1925 published book Statistical methods for research workers (Fisher, 1970) and his in 1935 published book The Design of Experiments (Fisher, 1951) are landmarks in the history of statistics (Kendall, 1963). You just have to look in a random On the common origins of psychology and statistics 56 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ scientific journal to see the effect of Fisher’s ideas: you will see that words concerning the design of experiments like ‘randomization’, ‘blocking’ and ‘replication’ abound (Gigerenzer et al., 1990, p. 75) and the same holds for the significance levels “p< 0.05” and “p< 0.01”. For a proper understanding of what happened during the statistical inference revolution it is necessary to enlarge a bit upon ‘significance testing’, because not every reader will be familiar with the concept. In significance testing data are compared with the distribution one would expect if the null hypothesis, i.e. the distribution when there is no effect beside the variation that one would expect from ‘mere chance’, were true. “The smaller the level of significance, the more discordant the data are with the null hypothesis” (Gigerenzer et al., 1990, p. 78). So what does it mean when a researcher states that he rejected his null hypothesis (e.g., treatment X has no effect on disease A) because he obtained “significant results (p=0.01)”? Only this: The probability of the data, according to the null hypothesis, is 0.01. […] Either the null hypothesis is true, in which case something unusual happened by chance (probability 1%), or the null hypothesis is false. (Hacking, 2001, p. 215) A world of difference lies between, on the one hand, Fisher’s inferential techniques and those of the other statisticians of his generation – although of course indebted to their statistical predecessors – and, on the other hand, the biometrical statistics of Galton and Karl Pearson. The differences in approach led to frictions between the old and the new generation of statisticians. Small sample statistics were effectively invented by a professional employee of the Guinness brewery, W.S. Gosset, for whom the repetition of trials hundreds of times would have been far more trouble than it was worth. Gosset spent the year 1906-1907 at University College with Pearson, but with the practical needs of the brewery always in mind. […] He [Pearson] remarked in a letter of 1912 to Gosset that only “naughty brewers” used so small in their work. (Porter, 1986, p. 317) However, the misunderstandings between Karl Pearson and the younger generation statisticians sink into nothingness in comparison with the frictions within this new generation itself. They were torn asunder in two camps: Fisher on one side and Egon Pearson (Karl Pearson’s son) and his friend Jerzy Neyman on the other – both camps claiming that W.S. Gosset was on their side (Gigerenzer et al., 1990, p. 105). The feud raged from the 1930s until Fisher’s dead in 1962 (Kendall, 1963). On the common origins of psychology and statistics 57 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ Figure 19. W.S. Gosset One suggests that the antagonisms started in the second decade of the twentieth century when Karl Pearson had declined to publish an article of the at that moment still young and unknown Fisher in his pre-eminent statistical journal Biometrika. Fisher, endowed with a great mathematical ability, had solved in this article a problem – concerning the statistical distribution of Galton’s correlation coefficient – which Karl Pearson had tried to solve for some time without success. Figure 20. R.A. Fisher Pearson rejected Fisher’s article because he had difficulty understanding the complex mathematics Fisher had used – and so had Gosset whom Pearson asked for advice in this matter – and possibly also because he was offended by the fact that Fisher had pointed out errors in his work (Gigerenzer et al., 1990; Salsburg, 2001). Fisher, who sometimes has been described as ‘cantankerous’ (Dawkins, 2000), never forgave Karl Pearson and even “when Karl had been dead for twenty years, Fisher wrote as if wounds were still smarting” (cf. Kendall, 1963, p. 3). In 1956 Fisher writes about Karl Pearson: On the common origins of psychology and statistics 58 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ …the terrible weakness of his mathematical and scientific work flowed from his incapacity in selfcriticism, and unwillingness to admit the possibility that he had anything to learn from others, […]. (Gigerenzer et al., 1990, p. 98) Although Karl Pearson’s son Egon Pearson, “like his father a distinguished historian of statistics as well as an eminent statistician” (Porter, 1986, p.305) strongly disagreed on some of the most fundamental statistical issues with his father and was impressed by Fisher’s new ideas, Fisher’s opinion on Karl Pearson extended also over his son. The scientific debates between Fisher and Egon Pearson were always characterized by “a bitter personal tone” (Gigerenzer et al., 1990, p. 98). Figure 21. E. Pearson Figure 22. J. Neyman Egon Pearson will write later to Jerzy Neyman about his feelings towards R.A. Fisher and Fisher’s attacks on his father: I was torn apart by conflicting emotions: (a) finding it difficult to understand R.A.F., (b) hating him for his attacks on my paternal ‘god’, (c) realising that in some things at least he was right. (Yates, 1984, p. 116) However, notwithstanding the collisions between Fisher’s statistical ideas and those of Pearson and Neyman, they all promulgated frequentist statistical inference: after all, it is only in the 1960s and 1970s that Savage and de Finetti started opposing the frequentist approach with Bayesian statistical inference. Although the historical-anecdotic answer is not incorrect, it does not clarify the conflict between Fisher and Pearson/Neyman in substance. Moreover, it smothers the fact that even in the first decades of the twentieth century – the heydays of the frequentist interpretation, when it was seen as a mortal insult to accuse someone of ‘Bayesianism’ or ‘subjectivism’ – it still On the common origins of psychology and statistics 59 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ seems to be the subjective interpretation of probability that is from behind the scene pulling the strings of the fierce frequentist statistical debates; the subjective interpretation is like the imaginary blood that lady Macbeths tries to wash off her hands in vain – even when it is not there, it guides the direction of the proceedings. (b.) The more complex ‘historical-philosophical’ answer to the question: ‘When did statistical inference emerge?’. Neyman and Egon Pearson – and later several writers would support them therein (e.g. Hogben, 1957) – accused Fisher of a “quasi-Bayesian view” (Gigerenzer et al., 1990, p. 103). What brought them to make this accusation? Fisher would not have described himself ever as a Bayesian and there are also good scientific reasons to draw a clear distinction between Fisher’s thought and Bayesianism (e.g. Barnard, 1987). Fisher actually said quite harsh words about Bayesianism and its presumed intellectual father, Reverend Thomas Bayes (1702(?)1761). Fisher once congratulated the Reverend Thomas Bayes for his insight to withhold his treatise from publication (it was published posthumously in 1763). (Gigerenzer et al., 2004, p. 405; see also Hacking, 1976, p. 201) As I explained earlier5, Bayesian statistical inference relies on a subjective interpretation of probability, which interprets probability as a degree of belief that we reasonable must attach to a hypothesis and that can be revised in the light of new data: e.g. the ‘Bayesian probability’ or ‘degree of belief’ that the sun will rise tomorrow becomes higher with every morning we see the sun rise. Fisher, Neyman and E.S. Pearson all agreed that the Bayesian position was untenable: for it juggles with ‘probabilities’ that have to be attached to ‘beliefs’ which tend to get lost in subjective arbitrariness and it always gets entangled in the induction problem, which undermines the rational status of Bayesian inference. Yet, Fisher could not get rid of the idea that probability in some way or another had to be related to the degree of belief one has. Thus he coined the notion ‘fiducial probability’. There has always been a great deal of confusion about the meaning of this notion. Fisher claimed that he was an absolute frequentist and that the notion ‘fiducial probability’ had nothing to do with subjective probability. As Gigerenzer puts it: “Fisher wanted to both reject the Bayesian cake and eat it, too” (Gigerenzer, 2000, p. 271). 5 See above, in § 2.C “Making sense in the pile of words: probability, statistics and statistical inference” (p. 34 f.f.). On the common origins of psychology and statistics 60 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ Figure 23. Reverend Thomas Bayes When the probabilist Savage – who adhered to a subjective interpretation of probability – once asked Fisher to tell him the exact meaning of ‘fiducial probability’ Fisher gave him a very candid answer: I don’t understand yet what fiducial probability does. We shall have to live with it a long time before we know what it’s doing for us. (quoted by Gigerenzer & Murray, 1987, p. 8) This is an inner tension that runs through a large part of Fisher’s work: on the one hand he completely rejects Bayesian probabilities – “as befits a good frequentist” (Gigerenzer & Murray, 1987, p. 10) – but on the other hand he believes that inductive inference is possible and that the acceptance or rejection of a null hypothesis does affect the degree of belief one has to attach to a hypothesis (Gigerenzer & Murray, 1987). So, viewed from the perspective of Neyman and E.S. Pearson, who were radical frequentists, it is quite comprehensible that they accused Fisher from Bayesianism. The space that Fisher seemingly left for ‘subjectivity’ in rather opaque notions such as ‘fiducial probability’ and ‘likelihood’ (Hacking, 2001, p. 245; Kendall, 1963) and the fact that Fisher did not exclude the possibility of inductive thinking was absolutely indigestible to Neyman and E.S. Pearson. Neyman said that there is no such thing as inductive inference; we only engage in inductive behavior (Hacking, 2001, pp. 242-243; Hubbard, 2004). Neyman and E.S. Pearson, pragmatic and anti-metaphysical, felt that Fishers ideas on ‘fiducial inference’ were metaphysical balderdash. According to Neyman Hume was right when he concluded that he did not have any reasons for believing any one conclusion (Hacking, 2001, p. 262): “To act as if a hypothesis were true does not mean to know or even to believe” (Gigerenzer & Murray, 1987, p. 14). Neyman and E.S. Pearson argued that the only reason for using statistical On the common origins of psychology and statistics 61 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ inferential methods is purely pragmatic, namely these statistical methods should be such a “rule of behavior” that “in the long run of experience, we shall not be too often wrong” (Neyman & Pearson, 1933, p. 291). So, if we say: “The hypothesis H0 (for instance: ‘treatment X has no effect’) is rejected at the one percent level (p < 0.01)”, this means that the data were very improbable given the hypothesis H0 and the probability of getting these data if hypothesis H0 is actually true is no more than one percent. So, it could be that hypothesis H0 is true and that we have falsely rejected it. Yet, if hypothesis H0 actually would be true and we would repeat our experiment many times, such results should occur no more than one percent of the time. The rejection of the hypothesis H0 at the one percent level only expresses the frequency of occurrence of certain data – given a certain hypothesis – in the long run: it does not imply that it has become more reasonable to believe this hypothesis H0 is false! The collision between Fisher’s views and those of Neyman and E.S. Pearson shows how difficult it is to formulate a purely frequentist inferential theory. It turns out to be very hard to think of inference – ‘the drawing of a conclusions about a unknown population, based on a limited amount of observations where you have knowledge of’ – as procedure that has nothing to do with your knowledge, believes and representations, but that it is solely a form of decision-making behavior that appears to be right most of the time. Neyman and Pearson believed that they made Fisher’s theory of ‘significance testing’ “more complete and consistent” (Gigerenzer et al., 1990, p. 102) by introducing next to the null hypothesis also an alternative hypothesis. Neyman and Pearson named their theory ‘hypothesis testing’. Because they did not believe in the existence of inductive inference resulting in ‘mere beliefs’ or ‘ideas’, but solely in ‘inductive behavior’, every statistical test had to entail a ‘decision’ or a ‘choice’ between hypotheses (see e.g. Hubbard, 2004). An example may clarify this position. Assume that a medical researcher has done an experiment to test if a certain drug has any positive effects. The subjects that took the drug in this experiment varied in their reactions to it. Fisher would probably just look how well the data fit statistically with the null hypothesis (H0: the drug has not any effect: the variability is not greater than one would expect to see due to mere chance) and discuss with some colleagues how to interpret the degree of fit. However, Neyman and Pearson would object that the researcher is not only confronted with the question how well the results fit with the distribution of the null hypothesis, but that On the common origins of psychology and statistics 62 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ he has to make an ‘economic’ decision between two hypotheses (H0, but also HA: the drug has positive effects) because he has to decide whether to use the drug or not. Therefore the researcher has to know what the power of his test is, i.e., the probability that when the alternative hypothesis is true (the drug has positive effects) that the data analysis will indeed lead to a significant effect and the rejection of the null hypothesis. The ‘power’ of a test is imbedded in the “cost-benefit calculations” (Gigerenzer et al., 2004, p. 399) the researcher has to make, i.e., he has to balance, for instance, the relative severity of making the error of deciding to give to patients a drug that does not work to the error of clutching to a too strict criterion and abusively not using a drug that actually does work. Figure 24. The null hypothesis and the alternative hypothesis Fisher thought that the statistical approach of Neyman and Pearson was a mechanical, thoughtless reduction of statistical inference (Gigerenzer, 2000, p. 277; Gigerenzer et al., 1990, p. 78): “We have the duty of formulating, of summarizing, and of communicating our conclusions, in intelligible form, in recognition of the right of other free minds to utilize them in making their own decisions.” (Fisher, 1955, p. 77) Fisher seized the fact that Jerzy Neyman was of Polish origin to present his debate with him as a confrontation between the free western world and the technocratic, dictatorial world behind the iron curtain: I shall hope to bring out some of the logical differences more distinctly, but there is also, I fancy, in the background an ideological difference. Russians are made familiar with the ideal that research in pure science can and should be geared to technological performance, in the comprehensive organized effort of a five-year plan for the nation. […] In the US also the great importance of organized technology has I On the common origins of psychology and statistics 63 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ think made it easy to confuse the process appropriate for drawing correct conclusions, with those aimed rather at, let us say, speeding production, or saving money. There is therefore something to be gained by at least being able to think of our scientific problems in a language distinct from that of technological efficiency. (Fisher, 1955, p. 70) Fisher makes very clear what is here at stake: it is the ‘free mind’ and ‘a language distinct from that of technological efficiency’. What is it that makes that Fisher – notwithstanding his frequentist approach – sticks so stubborn to the last shreds of ‘subjectivity’ or ‘Bayesianism’ with his ‘fiducial inference’? Why was it that “Fisher wanted to both reject the Bayesian cake and eat it, too” (Gigerenzer, 2000, p. 271)? Is it just that he is “determined by his emotional make-up, not by reason or mathematics” (Kendall, 1963)? Gigerenzer’s explanation of this matter has the same ‘psychological’ orientation as Kendall’ explanation. Gigerenzer (2000) clarifies the relation between the Neyman-Pearson theory, the Fisherian theory and the Bayesian theory by using a Freudian analogy: the Neyman-Pearson theory functions as the Superego (it “forbids epistemic statements about particular outcomes or intervals”); the Fisherian theory functions as the Ego (it “makes abundant epistemic statements about particular results”, but “is left with feelings of guilt and shame for having violated the rules”) and the Bayesian theory functions as the Id (it makes “statements about probabilities of hypotheses”, although it is censored “by both the frequentist Superego and the pragmatic Ego”) (Gigerenzer, 2000, p. 280). But is such a ‘psychological’ explanation sufficient? Or is there nonetheless a philosophical or logical necessity for the ‘fiducial argument’, as Hacking claims (1976) in his endeavour to remould Fisher’s ideas to a “Fisher-Hacking theory of fiducial inference” (Bartlett, 1966, p. 632)? On the common origins of psychology and statistics 64 CHAPTER 2: FROM AN EAGLE’S POINT OF VIEW - A HISTORICAL & PHILOSOPHICAL OVERVIEW OF THE WORDS ‘PROBABILITY’ AND ‘STATISTICS’ [2.F] Recapitulation & outlook on the next chapter Recapitulation of chapter 2: A historical, philosophical and terminological overview of the words ‘probability’ and ‘statistics’ has been presented. Since its beginnings in the second half of the seventeenth century until the 1840s probability is called classical probability. After the 1840s modern probability emerges. In retrospective classical probability seems to be an ambiguous amalgam of subjective and objective probability. Since Kolmogorov formulated in the beginnings of the twentieth century his axiomatic system there is a great consensus about the probabilistic calculus: however, on a theoretical level it is unclear if probability has to be interpreted as a subjective or an objective phenomenon. This debate had also its repercussions on inferential statistical methodology, that is based on probabilistic assumptions. Since the beginnings of the statistical inferential revolution in the first decades of the twentieth century scientific statistical methodology is almost completely based on the ‘objective’, i.e. ‘frequentist’, interpretation of probability. Although in the 1960s and 1970s Bayesian inference (whose underlying probabilistic assumptions are given a subjective interpretation) gained some ground – for instance in Artificial Intelligence research and in the courtroom in the evaluation of the weight of evidence – it stayed of marginal significance to scientific methodology. Nevertheless, even within frequentist inference it appears to be very hard to get rid completely of the subjective (or: ‘Bayesian’) interpretation and shreds of subjectivity abound. Outlook on chapter 3: Whereas chapter 2 presented the contrast between frequentist (objective) and Bayesian (subjective) statistics from a statistical and a historical or even anecdotic point of view and the ‘contamination’ of frequentist statistics by subjective semantics as a psychological curiosity, chapter 3 will: (a) show how the frequentist and Bayesian position are a philosophically incommensurable and hard opposition by linking Bayesian statistics to ‘classical epistemology’ and frequentist statistics to ‘evolutionary epistemology’, (b) show how the contrast between Bayesianism and frequentism underlies also a contrast between, respectively, the cognitive theories and the statistical methodology in psychology. I will concretise this now a little bit. From the perspective of evolutionary epistemology – that was formulated in the first place by the philosopher Karl Popper – rationality has no need for a ‘knowing subject’, ‘beliefs’ or ‘representations’. Popper has fought his whole life against ‘subjectivity’ and has tried to introduce ‘nonsubjective’ words as ‘corroboration’ and ‘falsification’ to describe the growth of knowledge. In this evolutionary approach statistical methodology has to be understood as a re-enactment of the evolutionary process of natural selection amongst psychological theories and ideas – not much different from the evolution of plants and animals – whereby both the researchers as the researched subjects are just ‘cogs’ in wheel of science. The insupportable ‘bleakness’ of this idea may explain why the frequentist ‘meaning’ of statistics is often so poorly grasped by its applicants: both the semantics of cognitive psychological theories as well as of frequentist statistical inferential methodology are still drenched in ‘subjectivity’. However, whereas in a lot of contemporary cognitive theories Bayesianism is the ‘official’ standard against which rationality is measured, in statistical methodology the use of subjective semantics is ‘just a slip of the tongue’ which keeps creeping into frequentist semantics. On the common origins of psychology and statistics 65 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? The development of statistical inferential thought in the beginnings of the twentieth century concerned frequentist (i.e. ‘objective’) statistical methodology. Bayesian (i.e., ‘subjective’) statistical inference is mostly not even mentioned in the methodological textbooks in psychology. I . The lure of subjective semantics: outlook on sections (A), (E) and (F) of this chapter. So, Bayesian statistics is hardly ever mentioned in the methodological textbooks in psychology. However, as became apparent from the description of Fisher’s ideas in the previous chapter and as is showed for instance by the research of Gigerenzer (2004) that will be described in this chapter, the (incorrect) application of ‘subjective’ semantics to frequentist statistical inference appears to be widespread and ineradicable. The lure of ‘subjective’ semantics in statistical inference will be the subject of section (E) of this chapter. The lure of ‘subjective’ semantics is also present in psychological theories of mind. The normative idea that human cognition should obey to the rules of Bayesian inference – i.e., that the nonconformity of human cognition to the rules of Bayesian inference is a sign of its limitations and weaknesses – is a theory that has gained since the 1970s a lot of support in cognitive psychology. Kahneman and Tversky, who won in 2002 the Nobel Prize of Economics, have promulgated this normative idea that human cognition is fundamentally deficient: in order to be rational it should have been a flawless Bayesian ‘intuitive statistician’ – but it is not (Gigerenzer, 2000; Gigerenzer & Murray, 1987; Gigerenzer et al., 1990). Following the research of Kahneman and Tversky – wherein they argued that “in his evaluation of evidence, man is apparently not a conservative Bayesian: he is not a Bayesian at all” (Kahneman & Tversky, 1972, p. 450) – cognitive illusions, heuristics and biases have become “the fodder for classroom demonstrations and textbooks” (Gigerenzer, 2000, p. 237). The success of this “heuristics-and-biases” movement can be partly explained by the fact that it is in a way very much fun to gloat upon how dumb an intuitive statistician human cognition is (Gigerenzer, 2000, p. 237). However, underlying the “heuristics-and-biases” program (whose origin can be found in: Kahneman, Slovic, & Tversky, 1982) is nonetheless the idea of a subjective mind that makes inferences according to the rules of probability – sometimes successful, sometimes On the common origins of psychology and statistics 66 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? not. The idea that human rationality should be measured by the standards given by Bayesian ‘intuitive statistics’, fits into a general tendency in cognitive psychology to formulate theories on human cognition in very ‘subjective’ semantics, encompassing words such as ‘beliefs’ and ‘representations’. This tendency is enhanced by the popularity of neural network modelling in psychology (Wheeler, 2005) that is closely related to artificial intelligence, wherein Bayesian inference has found a lot of applications. The lure of ‘subjective’ semantics in cognitive psychology will be the subject of section (F) of this chapter. II . Frequentist statistics as a ‘re-enactment’ of evolution that has no place for ‘subjectivism’: outlook on section (B), (C), (D) and of this chapter. However, before expanding on the point how statistical methodology and cognitive psychology are ensnared in ‘subjective’ semantics, I will answer in sections (B), (C) and (D) of this chapter the question what is ‘wrong’ with ‘subjective’ semantics. Is statistics not an expression of how probable our belief in a hypothesis is? No, certainly not! And is psychology not a science about our representations and beliefs? No, certainly not! To substantiate this I will turn to the philosophy of Karl Popper (1902-1994), who has shown convincingly the shortcomings of a subjective approach to probability. He approaches knowledge from an evolutionary approach, that excludes Bayesian statistics. Hence, I will show how in the same vein the use of ‘subjective’ semantics does not correspond to the practice of psychological research. III. Overview of the sections in this chapter So, I summarize the subjects that will be treated in the six sections of this chapter. The sections marked with an asterisk (*) deal with the allurement of subjective semantics. The sections without asterisk deal with frequentist statistical inference as an evolutionary process. [A] * Probability entangled in subjective semantics and the problem of induction. [B] Probability freed from subjective beliefs: rationality without a knowing subject. [C] Statistics à la Popper: natural selection of a falsifying rule for statistical hypotheses. [D] The statistical research methodology as a re-enactment of evolution: stability in the long run. [E] * The lure of ‘subjective’ semantics in statistical inference. [F] * The lure of ‘subjective’ semantics in cognitive psychology. On the common origins of psychology and statistics 67 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? [3.A] Probability entangled in subjective semantics and the problem of induction. In retrospective it appears that the first two centuries after its emergence probability was ‘janus-faced’: it was from a modern point of view a curious amalgam of ‘subjective’ (i.e., ‘epistemological’) and ‘objective’ (i.e., ‘frequentist’) probability. Classical probabilists thought that probability was a phenomenon not only related to frequencies in reality, but also to human beliefs. Moreover, the classical probabilist believed that there was an exact correspondence between the tendency of a relative frequency to stabilize in the long run (the longer the series of throws with a die, the closer will the relative frequency with whom a certain number turns face up approach the true probability), and the increasing probability that can be attached to our beliefs: i.e., as the number of throws increases, we become more ‘experienced’ and subsequently our beliefs will be less and less subject to uncertainty. However, as Hume pointed out already in the seventeenth century6 there is no rational ground why we should assume that an increasing amount of experience would justify that the probability which we may attach to our beliefs will increase too. This is the so-called ‘problem of induction’: there is no rational ground for assuming that the future will be conform to the past. The fact that the sun raises every day for millions of years does not entail that it is rational to think that the hypothesis that the sun will raise tomorrow has a very high probability: we have just grown accustomed to the fact that the sun raises every day. There are probably not many people who worry every evening about the possibility that the sun will not raise the following morning: however this widespread unconcern is based solely on habit and has no rational ground whatsoever. In the 1840s classical probability came to an end and probabilists began to discern between subjective and objective probability. Objective, i.e., ‘frequentist’, probability is in principle unrelated to our beliefs and knowledge and therefore should not be affected by the problem of induction: the tendency of the relative frequency to stabilize in the long run is independent of observation. However, because our language is so imbued with ‘subjectivity’ (“I observe how the relative frequency stabilizes in the long run and this brings me to involve in inductive reasoning and make statements about the probability of future throws”) that it took quite some time before a theory of frequentist probability could be formulated that was not subject to the 6 See above, in § 2.C “Making sense in the pile of words: probability, statistics and statistical inference” (p. 3132). On the common origins of psychology and statistics 68 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? problem of induction. After all, we saw in the previous how even radical frequentists such as Jerzy Neyman and Egon Pearson spoke of ‘inductive behavior’: apparently even they could not get rid of the word ‘induction’. In “1927 or thereabouts”(Popper, 1979, p. 1) the philosopher Karl Popper (1902-1994) found a way to formulate objective probability without getting trapped in ‘subjective’ semantics and the induction problem that is entailed by this semantics: Of course, I may be mistaken; but I think that I have solved a major philosophical problem: the problem of induction. (Popper, 1979, p. 1) Figure 25. Karl Popper The ‘de-subjectivization’ of probability by Popper was made in two steps: (i) showing why the subjective interpretation of probability and the induction problem are irrelevancies, following from a mistaken epistemology. (ii) formulating an objective interpretation of probability, uncontaminated by subjectivist semantics. These steps will be the subject of the next section. On the common origins of psychology and statistics 69 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? [3.B] Probability freed from subjective beliefs: rationality without a knowing subject. This section shows why the subjective interpretation of probability and the induction problem are irrelevancies following from a mistaken epistemology, viz., from pre-Darwinian classical epistemology. The first part of this section will be devoted to contrasting classical epistemology with the ‘Darwinian’ or ‘evolutionary’ theory of knowledge. The second part of this section will show how there is no place for subjective probability in an evolutionary approach. The third part formulates the objective propensity interpretation of probability of Popper. I. The process of tentative solutions (conjectures) and falsifications (error elimination). Darwinism changed the way we view the world. Probably the most fundamental change was that Darwinism made the idea possible that there could be rationality without previous design, viz., that for instance such a complex organ as the human eye could emerge on the base of mere variation and selection. Adaptations can arise by natural selection, without need of intelligence: all the ‘designs’ in the biosphere emerged from ‘mindless’ algorithmic process (Dennett, 1996). It was Darwinism that made it possible for Popper to escape from ‘subjective’ semantics: Popper solved the problem of induction because he rejected classical epistemology and adopted a ‘biological’, ‘Darwinist’ or ‘evolutionary’ outlook on knowledge instead. This ‘evolutionary approach’(Popper, 1979) – Popper considered the expression ‘evolutionary epistemology’ too pretentious and preferred to speak of an ‘evolutionary theory of knowledge’ (Popper, 1990) – allowed him to formulate an objective interpretation of probability that has no need for the problematic subjective concepts from classical epistemology, such as ‘beliefs’ or ‘representations’. In ordinary language knowledge is most of the time tight up to a knower: we are used to utterances as “I know”, “I belief” or “I am thinking”. Classical epistemology, i.e., practically every epistemology before Popper, assumed that there is no other knowledge than this subjective knowledge – knowledge tight up to a knower. In classical epistemology knowledge was consequently seen as “a certain kind of belief – justifiable belief, such as belief based on perception” (Popper, 1979, p. 122). Popper argued that this thought – ‘all knowledge is subjective knowledge’ – is a fallacy. The growth of scientific knowledge is a growth of objective knowledge – knowledge On the common origins of psychology and statistics 70 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? without a knowing subject. Objective knowledge ‘grows’ or ‘evolves’ in a way that is very similar to “biological growth; that is, the evolution of plants and animals” (Popper, 1979, p. 112). In the same way as the evolution of plants and animals is a rational process that takes place without some underlying design, knowledge also evolves in a rational way independent of particular ‘knowing subjects’: therefore one can call it objective. The growth of knowledge is not a result of an increase in ‘subjective’ knowledge located in the mind of some knower, but is instead an ‘objective’ process of error elimination, viz., the elimination of unsuccessful ideas, resulting from the exposure of knowledge to the surrounding world. An example may elucidate this process of error elimination. There have been times, for instance, wherein the idea that the world is a flat floating disk was a very successful replicator – Dawkins (1989) would call such a cultural replicator a ‘meme’. Of course it originally must have been uttered by somebody – however, immediately after its ‘conception’ it began to lead its own life as a replicator: it nestled itself in the human mind and was transmitted from generation to generation. Yet, after the idea of a flat earth became more and more exposed to evidence contradicting it, it eventually was conquered by the idea that the surface of the earth is spherical. The belief in a flat earth was eliminated in a Darwinist struggle for life between competing theories. So, according to Popper the growth of knowledge does not follow from inferences or inductions, but solely from error elimination. This is quite counter-intuitive – growth of knowledge is in fact reduction of errors: we just err less and less. However, the theory that the surface of the earth is spherical is – like all our theories – still a conjecture, i.e., a tentative theory. We can never be sure that this theory is true. The only ‘progress’ in our knowledge is due to error elimination: viz., that we effectively eliminated the theory that earth is flat. Popper argues consequently that the problem of induction now becomes irrelevant, because it wrongly assumes our knowledge grows due to inductive reasoning, whereas it only ‘grows’ due to error elimination. The evolution of knowledge is described by the following simple schema: “P1 TT EE P2”, that is “problem P1 tentative theoretical solution evaluative error elimination problem P2” (Popper, 1979, p. 119 f.f. and p. 144). The evolution of objective knowledge does in principle not differ from the evolution of other “non-living structures which animals produce, such as spiders’ webs, or nests built by wasps or ants, the burrows of badgers, dams constructed by beavers, or paths made by animals in forests” (Popper, 1979, p. 112). On the common origins of psychology and statistics 71 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? Figure 26. Karl Popper lecturing. On the blackboard behind him he has written: P1 TT EE P2 For instance, beavers – Popper used these animals very frequently as an example (cf. Popper, 1990, p. 50-51) – are in principle not too particular in their choice of material that they use to build there dams. Nevertheless, their choice of material is subject to feedback from the environment. If dams build of tin-cans do not succeed in slowing the river water down this apparently unsuccessful solutions will be eliminated through environmental negative feedback: either the beavers will adjust their behaviour and their attempts to build dams out of tin-cans will extinguish, or – if some erroneous beavers would persist in their preference for tin-cans – they will be ‘eliminated’ themselves through natural selection and the dams build of tin-cans will perish accordingly. However, every choice of material for building a dam always will stay a tentative solution to the problem how to slow the water in the river down. Figure 27. A beaver. This animal is the mascot of the London School of Economics, where Popper was a Professor for 23 years. The dam-repairing beaver was one of Popper’s favourite examples of how his ‘critical approach’ was also present in the animal kingdom (e.g. Popper, 1990, p. 50-51). On the common origins of psychology and statistics 72 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? The past experience of a beaver that twigs, gnawed branches and rocks form solid building blocks for a dam, gives him no guarantee whatsoever that his future dams of twigs, gnawed branches and rocks will be successful too. So, according to Popper human knowledge is not fundamentally different of other “objective products of life, such as spiders’ webs, birds’ nests, or beaver dams”, for they are all “products that can be repaired or improved” (Popper, 1990, p. 50). Moreover, both beaver dams and human objective knowledge is always a conjecture, i.e. a tentative solution to a problem, that can be ‘eliminated’ in the struggle for life. Both beaver-dams and scientific knowledge evolve according to the ‘mindless’ algorithmic process of P1 TT EE P2, etc. Scientists let “their false theories die in their stead” (Popper, 1979, p. 122), whereas beavers – at least those who have the capacity to learn from their faults – let their ‘false’ beaver-dams die in their stead. Thus, the scientist and the beaver are united by the fact that their conjectural solutions can die in their stead – in this sense they are both “Popperian creatures” (Dennett, 1996, p. 375 f.f.). Yet, there are of course differences between beavers and scientists. For instance, if a beaver will meet by chance another beaver in the woods they will not be able to have a little chit-chat about their experiences with building dams out of tin-cans: after all, the beaver is not able to transmit the information that tin-cans are poor material for building dams. No wonder their comprehension is limited. Ours would be, too, if we had to generate it all on our own. (Dennett, 1996, p. 380) The transmission of information in non-genetic ways – through ‘memes’ subject to ‘cultural evolution’ – has enabled human knowledge to evolve at a pace that is an incomparably faster pace than that of genetic evolution (Dennett, 1996): “…we today – every one of us – can easily understand many ideas that were simply unthinkable by the geniuses in our grandparents’ generation!” (Dennett, 1996, p. 377). At a very quick rate the grains of knowledge are sifted from the husk. Nevertheless, it is evident that the pace of the growth of knowledge in, for instance, Antiquity was much lower than it is now: especially since the first decades of the twentieth century the pace of the growth of scientific knowledge has known an immense acceleration, viz., it has known a tremendous acceleration of the elimination of probably erroneous information. On the common origins of psychology and statistics 73 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? I believe that frequentist statistical inference has played a major role in the ‘speeding up’ of science, because in a sense it re-enacts the evolutionary processes in such a way that variation is ‘tamed’ in it and falsification elicited. An example may clarify this. A beaver does not build his tin-can dam for the sake of hypothesis testing: if his dam is swept away by the water, it is conceivable that he will rise ‘a sadder and a wiser’ beaver on the ‘morrow morn’ – for the beaver may have the ability to learn from his faults – but the beaver did not elicit this ‘falsification’ of his tin-can ‘hypothesis’, nor did he accelerate the ‘mindless algorithm of natural selection’ through an analysis of the events on the basis of probability-based modelling of his hypotheses and statistical inference that would permit him to ‘tame’ the ‘chance factor’ in this manner. Now, assume that, for instance, Aristotle passes by the river and wonders why the beaver-dam was swept away. He discusses this question with a friend, who tells him that every year around this time there are such heavy flash floods, that the beaver-dams are swept away – no matter what they are made of. Aristotle has a considerable advantage in comparison to the beaver: for he has access to a world of information, which the beaver has not. However, if a modern scientist would pass the river he could set-up an experiment with a neatly randomized design of beaver-dams made of different materials to see if there is any statistically significant difference between the stability of these dams. The personal beliefs of the modern scientist about the stability of a particular beaver-dam would not matter: what matters is that he created a experimental ‘set-up’ which results in data that may be so improbable – given the null-hypothesis (‘there is no more difference between the stability of beaver-dams made of twigs and beaver-dams made of tin-cans than one would expect due to mere chance variation’) – that this null-hypothesis has to be rejected: a probably erroneous theory has been eliminated. I will elaborate on the relationship between statistical methodology and evolution in section (D) of this chapter. II. There is no place for subjective probability in an evolutionary approach. Popper argued – as I mentioned already briefly above7 – that it absolutely impossible to attach a ‘probability’ to a hypothesis: for a probability can only refer to the data. It is a fallacy to think that the more white swans one encounters, the more probable the hypothesis ‘All swans are white’ becomes; after all, one just needs to encounter one black swan to falsify this 7 See above, in § 2.C “Making sense in the pile of words: probability, statistics and statistical inference” (p. 3637). On the common origins of psychology and statistics 74 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? hypothesis completely (Popper, 1972). Popper therefore strongly opposed to every form of Bayesian statistics which does attach a probability to hypotheses. Even Fisherian statistical inference, i.e. frequentist interference with a subjectivist twist, is absolutely irreconcilable with Popper’s thought (Gigerenzer & Murray, 1987; Mayo, 1996). An example may clarify this. Imagine that it is the 1st September of 1994 and that Karl Popper has a nurse who is a strong Bayesian believer. Although Popper tries to explain to her that Bayesianism is nonsensical and that a hypothesis cannot have a probability, the nurse wants to calculate what the probability of her hypothesis is that Popper will be alive on the next day. She calculates the Bayesian probability and the next day Popper indeed is alive. So she decides to calculate the probability that he will be alive also on the next day. With every day that Popper lives the Bayesian probability that he will live the next day rises, because every day the probability of the belief of the nurse in strengthened with the evidence of a living Karl Popper. There is a strange paradox here, and I myself find it highly moving to read how Popper says in 1988, when he is already 87 years old: How probable is it that you will live another 20 years? This has its own little mathematical problems. Thus, the probability that you will live another 20 years from today – that is that you will be still alive in 2008 – increases for most of you every day and every week as long as you survive, until it reaches the probability 1 on the 24th of August 2008. Nevertheless, the probability that you will survive for another 20 years from any of the days following today goes down and down with every sneeze and with every cough; unless you die by some accident, it is not unlikely that this probability will become close to 0 years before your actual death. (Popper, 1990, p. 8) On the 17th September of 1994 Karl Popper dies: on this date the datum of Popper’s dead has ‘falsified’ the hypothesis of the nurse. Data have probabilities, hypotheses have not8. A frequentist statistician may gather the data of birth and death of Austrian philosophers and study if these data – expressing the lengths of their lives – are probable, given the hypotheses that they do not differ from the life expectancies of other West-European males. Keynes – who adhered to a subjective interpretation of probability (Gillies, 2003; Hacking, 2001) – jeered at the frequentist position by paraphrasing it as: “In the long run we are all dead” (von Plato, 1987, p. 381). Nevertheless, this ironical paraphrasing of the frequentist interpretation shows something of the abyss that lies between the subjective probability of the 8 See also above, in § 2.C “Making sense in the pile of words: probability, statistics and statistical inference” (p. 37). On the common origins of psychology and statistics 75 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? belief of the nurse concerning the particular human being Karl Popper and the objectivefrequentist probability of the ‘long run’ and the ‘we…all’. As long as hypotheses are not falsified, one has to stick to it. However, when one has to stick to a hypothesis because it has stand up to a test, this does not imply that a hypothesis is ‘true’ or ‘very probable’: after all, all our knowledge is only tentative, i.e., ‘conjectural’, and may be falsified in a later test. Nonetheless, one can contend that a hypothesis that stands up to many tests is a strong hypothesis: Popper calls such hypotheses corroborated hypotheses (Popper, 1972). Yet, the degree of corroboration has nothing to do with the calculus of probabilities. Popper’s fight against the subjective interpretation of probability was not an easy one. Popper saw around him how philosophers of probability constantly fell back to the ‘old’ subjectivist beliefs concerning probability. An exemplary history is that of the philosopher Carnap. To conclude this section I will cite Popper himself: Carnap was then [in 1934], and for some years afterwards, entirely on my side, especially concerning induction […]. Carnap and I had come, in those days, to something like an agreement on a common research programme on probability, based on my Logik der Forschung. […] and we agreed not to assume […] that the degree of confirmation or corroboration of a hypothesis satisfies the calculus of probabilities […]. This was the state of the discussion reached in 1934 and 1935. But 15 years later Carnap sent me his new big book, Logical Foundations of Probability, and, opening it, I found that his explicit starting point in this book was the precise opposite – the bare, unargued assumption that the degree of confirmation is a probability in the sense of probability calculus. I felt as a father must feel whose son has joined the Moonies; though, of course, they did not yet exist in those days. (Popper, 1990, p. 4-5) III. The objective propensity interpretation of probability. So, if probability is not an expression of the degree of certainty we can attach to our beliefs and hypotheses – what is it? Until Popper all objectivist interpretations of probability had been frequentist interpretations of probability, viz., they stated that in the long run relative frequencies tend to stabilize. The question why a relative frequency (for example of a ‘heads’ turning up in 50% of all tosses with a coin) tends to stabilize in the long run was mostly evaded: the only matter of importance was that this frequentist probability could be measured objectively. Popper however was more explicit about objectivist probability and argued that probability is a ‘tendency’, ‘disposition’ or ‘propensity’ of certain conditions to generate the observed relative frequency. He called this the ‘propensity’ interpretation of probability. On the common origins of psychology and statistics 76 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? According to Popper probability is a pure objective propensity (Popper, 1972, 1983, 1990), i.e., a ‘tendency’ of a system to behave in a certain way: a coin may have the objective propensity to turn up heads in approximately ½ of all tosses, independent from our subjective beliefs. But this means that we have to visualise the conditions as endowed with a tendency or disposition, or propensity, to produce sequences whose frequencies are equal to the probabilities; which is precisely what the propensity interpretation asserts. (Popper, 1959, p. 35) On the common origins of psychology and statistics 77 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? [3.C] Statistics à la Popper: the natural selection of a falsifying rule for statistical hypotheses. What kind of statistical inference would Popper’s evolutionary theory of knowledge and his objective propensity interpretation of probability have endorsed? Though it is clear that Popper’s ideas concerning statistical inference are much closer to those of Jerzy Neyman and Egon Pearson than to those of Fisher, I do not think that Popper would have ever used, for instance, an expression as ‘inductive behavior’: both the notion of ‘induction’ and ‘behavior’ seem to me rather inappropriate from the Popperian perspective – for Popper despised both inductivism as well as behaviourism (see e.g. Popper, 1974). After all, it is the job of philosophers to be extremely particular about words. So, in which words should statistical inference be thought of, according to Popper? In the first place it would have to be clear that in experimental research9 the name ‘statistical inference’ does not imply ‘statistical induction’. After all, we do not aim to induce theories but only to eliminate erring theories. I think that from Popper’s point of view it would be maybe more appropriate to speak of, for instance, ‘statistical falsification methodology’ or ‘statistical error elimination methodology’. Nevertheless, science never adopted an expression that combined the words ‘falsification’ and ‘statistics’. The reason hereof is contained in the second remark I have to make, viz., that probabilistic statements are in principle not falsifiable (Popper, 1972)! For instance, the hypothesis that all swans are white can be easily falsified by one black swan; however, there is no analogous method to falsify the hypothesis that a coin is unbiased, i.e., has a probability of ½ to turn up tails. Only if we would be able to produce an infinite sequence of tosses with this coin – which is of course impossible! – and the relative frequency of tails would turn out to be for instance ⅓, we could falsify a probabilistic hypothesis: only “an infinite sequence of events […] could contradict a probability estimate” (Popper, 1972, p. 190). One could think that this would form a major problem for the empirical sciences – such as psychology – because practically all hypotheses they formulate are statistical, i.e., probabilistic, hypotheses: for instance, when a scientist wants to know if a certain treatment has any statistically significant effect his situation can be compared with a scientist tossing a coin and hoping to find out that the coin is biased, viz. that there is a difference between the treatment group and 9 However, it must of course be clear that in inferential non-experimental research – e.g. in a survey study – the estimations of population parameters from sample statistics are in fact inductions. On the common origins of psychology and statistics 78 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? the control group. However, the only way to falsify the null hypothesis that the treatment has no effect would require an infinite sequence of trials. For although probability statements play such a vitally important role in empirical science, they turn out to be impervious to strict falsification. (Popper, 1972, p. 146) Nevertheless, the empirical sciences are very successful in deciding when to accept and when to reject a hypothesis. Assume for instance that a scientist, whose hypothesis is that a coin is unbiased, has made 10.000 tosses of whom only 5 turned tails up. Given the hypothesis this result, i.e. these data, are highly improbable (although not impossible!) and therefore the scientist may decide to consider his hypothesis as “practically falsified” (Popper, 1972, p. 191): It is fairly clear that this ‘practical falsification’ can be obtained only through a methodological decision to regard highly improbable events as ruled out – as prohibited. But with what right can they be so regarded? Where are we to draw the line? Where does this ‘high improbability’ begin? (Popper, 1972, p. 191) To summarize: statistical inference in experimental research turned out (a) not to infer in the sense in which ‘inference’ is mostly understood, viz., as ‘induction’, but only to falsify, (b) however, frequency evidence cannot falsify statistical hypotheses – in any case not in the strict sense of the word ‘falsification’, therefore (c) statistical methodology has to rely on ‘practical falsification’, i.e., make a pragmatic decision about how low the probability of an observed result given the hypothesis should be, in order to lead to the rejection of the hypothesis. This pragmatic solution ‘solves’ the conflicting conclusions following from Popper’s evolutionary approach that shows on the one hand that induction is unscientific and that the growth of scientific knowledge can only follow from ‘error elimination’ or ‘falsification’, but on the other hand that statistical hypotheses are in principle unfalsifiable. This leads to the question how one should derive such a ‘pragmatic’ criterion for rejecting statistical hypotheses – in Popperian terminology one would say “a falsifying rule for probability statements” (Gillies, 1971). The only direction for the formulation of such a pragmatic criterion is that, given the hypothesis, the probability of an observed result should be ‘very low’. Although from a mathematical point of view the definition ‘very low probability’ is too simplistic (see for an extensive elaboration on this problem, e.g. Gillies, 1971) – I will for simplicity’s sake restrain from getting into mathematical delicacies and On the common origins of psychology and statistics 79 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? concentrate on the general theoretical aspects of how a falsification-criterion should be established. I will try to interpret the choice of such a criterion in the light of the evolutionary theory of knowledge that was expounded in the previous section (B). Viewed from the standpoint of an evolutionary theory of knowledge it becomes immediately clear that the ‘choice’ of a falsification-criterion is in itself also the result of an evolutionary process of variation and selection, or as Popper says, of accidents and preferences. It would be maybe better to speak of the ‘emergence’ (instead of ‘choice’) of a criterion out of a struggle for existence among a range of possible criteria: It is obvious that in the evolution of life there were almost infinite possibilities. But they were largely exclusive possibilities; so most steps were exclusive choices, destroying many possibilities. As a consequence, only comparatively few propensities could realize themselves. Still, the variety of those that have realized themselves is staggering. I believe that this was a process in which both accidents and preferences, preferences of the organisms for certain possibilities, were mixed: the organisms were in search of a better world. Here the preferred possibilities were, indeed, allurements. (Popper, 1990, p. 26) In the ‘struggle for existence’ amongst various statistical methods and criteria for the rejection of hypotheses there has turned out to be one winning team, namely significance testing combined with the rejection of the null hypothesis in significance testing at p < 0.05 or p < 0.01. The expressions p < 0.05 and p < 0.01 indicate that the data have a rather low probability given the null hypothesis: Either the null hypothesis is true, in which case something unusual happened by chance (probability [5% or] 1%), or the null hypothesis is false. (Hacking, 2001, p. 217) However, why would one not use for instance p < 0.03 or p < 0.005? In fact one could contend that the most widely applied rejection criterions in statistical methodology – the rejection of the null hypothesis in significance testing at p < 0.05 or p < 0.01 – actually was “a sort of mathematical accident (italics mine)” (Hacking, 2001, p. 217). Long before pocket calculators made some calculations trivial, these figures became easy standards for comparison, simply because you could compute them without weeks of back-breaking labor. Today many investigators use a statistical software package without really understanding what it does. You can just enter data, and press a button to select a program. (Hacking, 2001, p. 217) On the common origins of psychology and statistics 80 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? In retrospective one can see that this ‘mathematical accident’ was a very successful accident. In a sense it is the success of this methodology that justifies why one should assume that a probability of 5% or 1% is such a ‘very low probability’, that the hypothesis can be considered as ‘practically falsified’ and therefore rejected. The success of significance testing on a 5% or 1% legitimizes its practice – and from a Darwinistic point of view this is perfectly reasonable. Micheal Cowles concludes his book on the role of statistics in psychology in the same vain: If it is to be admitted that the logical foundations of psychology’s most widespread method of data assessment are shaky, what are we to make of the “findings” of experimental psychology? Is the whole edifice of data and theory to be compared with the buildings in the towns of the “wild west” – a gaudy false front, and little of substance behind? This is a unreasonable conclusion and it is not a conclusion that is borne out by even a cursory examination of the successful predictions of behaviour and the confident applications of psychology, in areas stretching from market research to clinical practice, that have a utility that is indisputable. The plain fact of matter is that psychology is using a set of tools that leaves much to be desired. […] But, they seem to have been doing a job. Psychology is a success. From a Darwinist point of view the statistical-psychological research practice is rational because it is turned out to be a success – however this can be seen only in retrospective. Of course: it could happen that the statistical-psychological research practice will be ‘falsified’ one day, but at this moment it is indisputably a success and subsequently rational too. On the common origins of psychology and statistics 81 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? [3.D] Statistical methodology as a re-enactment of evolution: tamed variation and accelerated selection Darwinism made rationality an equivalent to ‘that-what-does-not-perish-in-the-long-run’ in the evolutionary process of variation and selection: for if something survives it is apparently well enough adapted to its environment and subsequently it can be called rational. Thus rationality is no longer tight up to the rationality of a ‘rational subject’, i.e. the thought of person X or person Y (although the thoughts of person X or Y may turn out to be rational – if they survive in the Darwinist struggle for existence among thoughts). All the time theories are conquered by other theories: in that respect the fact that frequentist statistical methodology is the scientific methodology that has prevailed in the struggle for existence among different statistical approaches – at least for the time being – does not differ fundamentally from the fact that the belief in a flat earth has been conquered by the theory that the surface of the earth is spherical. Yet, the frequentist statistical methodology has a ‘feature’ which makes it to stand out against other theories which have been successful in the struggle for existence – after all, frequentist statistical methodology is not only is a ‘result’ of evolutionary processes, but it also ‘re-enacts’ these evolutionary processes to a certain extent. Just like ‘natural’ processes of evolution the frequentist statistical methodology is an algorithm (Dennett, 1996) consisting of competition and selection, which depends on (chance) variation and generates rational results in the long run. However, compared to ‘natural’ process of evolution, statistical inferential methods enhance and accelerate the processes of selection because they enable scientists to distinguish much better between ‘chance’ variation and ‘structural’ variation. I will discuss in this section in more detail the role in both evolution and frequentist statistics of (chance) variation and selection: (§1) (Chance) variation in evolution and statistics: Quetelet, Darwin, Galton and K. Pearson. (§1a) Quetelet and Variation: the constant cause of real variation. (§1b) Darwin and Variation: in search for the causes of the ‘details’ of variation. (§1c) Galton and Pearson: distinguishing structural variation from chance variation. (§ 2) Selection in evolution and statistics. On the common origins of psychology and statistics 82 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? (§1) (Chance) variation in evolution and statistics: Quetelet, Darwin, Galton and K. Pearson. Evolution depends on variation: if you have to select something, you have to be presented in the first place with a choice among different variants. Fortunately, there is no lack of variation. It is an empirical fact that variation is everywhere around us, for no replicative process is perfect: not only is biological replication subject to, for instance, genetic mutations, but also processes that are considered to be perfect ‘artificial’ copying processes are subject to deviations: although modern photocopying methods are of course more reliable than the handwritten copies made in monasteries by copyists in the days before printing, yet no copying system is perfect. “Mistakes will happen” (Dawkins, 1989). However, not all variation is ‘just’ meaningless and insignificant deviation. Some variation can be qualified as meaningful, while some variation has to be called ‘dumb’ chance. How can one distinguish between these kinds of variation? The ‘magic word’ which answered this question was the bell-shaped curve – the so-called ‘normal distribution’. Because this normal curve apparently governs many phenomena in this world – for instance biological features such as people’s heights, IQ scores or a random (non systematic) varying phenomenon such as unbiased measurement error – it became possible to estimate the probability of the variability of certain data, given a hypothesis. Figure 27. ‘Normal’ or ‘bell-shaped’ curve. The normal curve makes it for instance possible to say things like: “If the assumption is true, that the height of men in the Netherlands is normally distributed with an average of seventy inches and a standard deviation (the measure of the average distance of the data values from their mean) of four inches, then the probability that a randomly chosen man from the same On the common origins of psychology and statistics 83 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? population has a height within sixty-six and seventy-four inches is about sixty-eight percent” (cf. Aczel, 2004, p. 109 f.f.). Variation is in this way ‘domesticated’: for one can see now if certain variability is probable or not. Moreover, because chance variation – ‘error’ – is assumed to be distributed normally, improbable deviations from normality are a very useful indicator of systematic variation which has not been taken into account: deviations from normality have to be explained. The nineteenth century history and philosophical foundations of the normal curve can be found in the thought of Quetelet (1a), Darwin (1b) and Galton and Karl Pearson (1c). (§1a) Quetelet and Variation: the constant cause of real variation. To my knowledge the Belgian statistician Adolphe Quetelet was the first one who formulated the idea that mistakes are not only ‘errors’ following from replicative measurement (i.e., from our limited knowledge), but that there is also real variation in every replicative process (see e.g. Desrosières, 2002; Porter, 1986) and that this ‘variability’ is subject to the laws of probability. Figure 28. Quetelet In the twentieth letter of his Lettres (1846) Quetelet invites his readers to imagine that the king of Prussia gives the decree that thousand copies are to be made of the famous ancient statue, known as the ‘Borghese Gladiator’. If these thousand copies would be subsequently scrupulously measured it is beyond doubt that the copies would show inaccuracies and deviations from the original. On the common origins of psychology and statistics 84 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? These variations follow both from real deviations from the original and from inevitable measurement inaccuracy – after all, the successive measurement of one particular copy will probably generate slightly differing results (Desrosières, 2002). Hence Quetelet argues that analogous variation will be found also in the measurement of human beings, referring implicitly to the probabilistic calculations he had applied in 1844 to the height and chest measurements of 5,738 Scottish soldiers (Hacking, 2004; Porter, 1986; Stigler, 1986). Figure 29. Borghese Gladiatior (from the collection of the Louvre) Yet, from what does this ‘variation’ deviate? In the example of Quetelet of the ‘Borghese Gladiator’ it was clear: the variating copies deviate from the original statue. However, in most replicative processes such an ‘original’ does not exist. When we for instance would meet an extremely talented pianist, we probably would not say that he is a ‘deviation’ or ‘variety’ from the original man. Still, we would say that it is an ‘extraordinary’ or ‘non average’ person – so actually we deem him to be deviating, but deviating from what? Our contemporary speech is so imbued with ‘means’, ‘normality’ and ‘averages’, that it luring to think that these notions are timeless – but they actually only arose in their modern sense in the nineteenth century. Although in the 1840s the words ‘mean’, ‘normality’ and ‘average’ seemed to be in the air and one can easily quarrel about the question who was historically person the first to formulate these notion (see e.g. Stigler, 1999), I think that the ‘conceptual’ change is most clear in the thought of Quetelet. After all, it was he who formulated in the 1840s the groundbreaking analogy between the original statue of the ‘Borghese Gladiator’ and ‘l’homme moyen’ – true average man as the golden mean of man (Desrosières, 1993, 1999, 2002; Porter, 1986; Stigler, 1986). On the common origins of psychology and statistics 85 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? Thus Quetelet replaced Platonist thought – that we live in a world of mere appearances that reflect in a rather fluctuating and imperfect way the ‘other worldly’, fundamental and eternal ‘ideas’ or ‘essences’ that lie behind them – by the idea that the ‘essence’ of man does not lie in some metaphysical realm, but in the mean of a population, whose traits are the most probable to occur. Though there is of course no reason to assume that this perfectly ‘average’ man really should exist, Quetelet claimed that this homme moyen was the “constant cause” (Desrosières, 2002, p. 4; Stigler, 1986, p. 173 f.f.) behind all variation. Figure 30. Although Quetelet (1796-1874) lived before the eugenics movement (which arose only to the end of the nineteenth century), his ideas formed to a certain extent an ‘inspiration’ for it as is expressed by this statue of the ‘average American male’, that was an exhibit on the Second International Exhibition of Eugenics in 1921 in Cold Spring Harbor (Exhibits Book, p. 69). When Quetelet studied in 1844 the height and chest measurements of 5,738 Scottish soldiers (Stigler, 1986) he was struck by the fact that these measurements neatly scattered around the mean in a distribution that looked like a “chapeau de gendarme” (Desrosières, 2002, p. 4). On the common origins of psychology and statistics 86 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? Figure 31. A data plot by Quetelet (1846, p. 396). Quetelet studied the chest circumferences of 5738 Scottish soldiers. Quetelet’s distribution comes close to what we today call a normal distribution curve. Quetelet had been acquainted with this bell-shaped curve decades before, but in a totally different context, namely astronomy. After all, before Quetelet’s scientific interests turned to statistics and sociology in the 1830s, he had been an active astronomer – through his efforts in 1828 the Observatory in Brussels was founded – and he would always stay an astronomer: until his dead in 1874 he would be the director of the Observatory that he founded. Figure 32. Royal Observatory in Brussels Around 1800 the so-called ‘law of errors’ was discovered in astronomy – for at that time it had become clear that when several astronomers try to chart the position of a star, that their observations will vary. However, the variations of their observations tended to conform On the common origins of psychology and statistics 87 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? to a bell-shaped distribution – and the mean of this distribution was seen as the ‘real’ position of the star, i.e. the ‘constant cause’ behind all observations (see for a popular account Menand, 2002). As an astronomer Quetelet knew this bell-shaped ‘astronomical law of errors’ very well – so, when he discovered that human measurements conformed to the same distribution he reasoned analogously that this implied the existence of a ‘constant cause’, namely the ‘homme moyen’. Yet, whereas an observed star has probably a ‘true’ position, the ‘homme moyen’ is not a living individual, but solely an ‘abstraction’. However, this ‘abstraction’ became in a sense more real – after all, it is the ‘constant cause’ – than particular living individuals who, due to ‘accidental causes’, deviate from the ‘homme moyen’. The ‘morethan-reality’ or ‘essentiality’ of statistical parameters such as the ‘mean’ is an issue which even today triggers quite some contemplations (e.g. Desrosières, 1993; Desrosières, 2001; Salsburg, 2001). The idea that an ‘abstract’ mean is the underlying cause behind all worldly phenomena sounds of course like a rather Platonist idea There is nevertheless a major difference between the platonic timeless ‘idea’ or ‘essence’ and the average man of Quetelet: for Quetelet argued that the ‘average man’ is not a timeless ‘constant cause’, but – as he formulated himself already in 1835 – always “conformable to and necessitated by time and place” (Quetelet, 1835, vol. 2, p. 274; Stigler, 1986, p. 171). So, this is of course quite remarkable: the constant cause is just temporary – it is stability for the time being (see for a nice popular account Menand, 2002, p. 189 f.f.). This applied even to social phenomena such as suicides and murders: …when the “milieu” does not vary appreciably it will give rise to the same mean number of annual suicides, murders, and so forth. (Schweber, 1982, p. 346) Therefore, Quetelet deemed it the task of his ‘social physics’ to change the ‘milieu’, i.e. the physical, social, and institutional causes, that are responsible for these ‘ fearful regularities’ (Schweber, 1982, p. 347). (§1b) Darwin and Variation: in search for the causes of the ‘details’ of variation. Quetelet had shown that reproduction leads inevitably to variation and that this variation is not just some untransparent mishmash of variations, but that variation seemed to have the tendency to spread evenly, in a bell-shaped curve, around its mean. On the common origins of psychology and statistics 88 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? It is therefore not surprising that Darwin – whose theory consisted of variation and selection – turned to Quetelet in order “to obtain quantitative statements regarding variations and populations” (Schweber, 1977, p. 286). In 1838 there are several entries in Darwin’s notebooks that indicate his interest in Quetelet – whose work was well known in England in that time and who corresponded with several British scientists with whom Darwin was closely acquainted. In September 1838 Darwin read the extensive review of Quetelet’s Sur l’homme. Darwin got his own copy of Quetelet’s Sur l’homme, but because he had a rather poor knowledge of French it is not clear how carefully he looked at it (Schweber, 1977, p. 289). So, replication leads apparently to variation – but why does replication lead to variation? According to Schweber (1977; 1982; 1983) this question bothered Charles Darwin so much that it formed one of the reasons why his Origin of Species was only published in 1859, whereas Darwin had already in 1838 accepted “the ‘randomness’ of variations as a phenomenological fact” (Schweber, 1983, p. 43) and had developed by July 1839 “…a unitary evolutionary view of everything around him” (Schweber, 1977, p. 233). Darwin assumed that variation follows from a myriad of small, incontrollable influences such as for instance a fluctuation in temperature, exposure to light, nurture, etc. Figure 33. Darwin Thus Darwin considered ‘chance’ to be nothing in itself but just a provisional notion to express ignorance of the real causes of variation, thereby following the ideas of classical probability theory – which was the dominating theory on probability until the 1840s. In the Origins of Species (1859) he writes: I HAVE hitherto sometimes spoken as if the variations so common and multiform in organic beings under domestication, and in a lesser degree in those in a state of nature had been due to chance. This, of On the common origins of psychology and statistics 89 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? course, is a wholly incorrect expression, but it serves to acknowledge plainly our ignorance of the cause of each particular variation. (Darwin, 1998/1859, p. 131) and: Our ignorance of the laws of variation is profound. Not in one case out of a hundred can we pretend to assign any reason why this or that part differs, more or less, from the same part in the parents. But […], the same laws appear to have acted in producing the lesser differences between varieties of the same species, and the greater differences between species of the same genus. (Darwin, 1998/1859, p. 167) Darwin thought that the ignorance of causes was something that had to be conquered (Schweber, 1977): however, his search in following decades for the causes of what he called himself “the tendency of small change” (Schweber, 1983, p. 43) was rather unsuccessful and after the publication of the Origins of Species one can notice a slight shift in his opinions on variation – for he begins to assign to ‘chance’ a more ‘autonomous’ status. For instance, in a letter of 14 February 1861 to Leonard Horner he speaks of variations as “accidental or spontaneous” (Schweber, 1983, p. 79) and in 1860 he confesses – though apparently reluctant – that the ‘details’ of variation are due to ‘what we may call chance’: I am inclined to look at everything as resulting from designed laws, with details, whether good or bad, left to the working out of what we may call chance. Not that this notion at all satisfies me. (Schweber, 1983, p. 80) Thus the idea that eventually a growing amount of knowledge about the causes of biological variation would lead to the elimination of chance started to fade, for it became apparent that the ‘unpredictable’ or (to use an anachronistic term) ‘stochastic’ individual variation cannot be explicated completely (Schweber, 1982). Thus Schweber (1983, p. 79) argues that Darwin began after the publication of the Origin of Species to see variations as ‘chance phenomena in the “ontic” sense’. Yet, one had to answer the question how one had to deal with ‘chance variation’, now that the search for causes of variations had turned out to be unable to explain or predict the “details” of variation. (§1c) Galton and Pearson: distinguishing structural variation from chance variation. The answer to the question ‘how to deal with chance variation?’ would be provided by frequentist statistics with its emphasis on the long run and populations. On the common origins of psychology and statistics 90 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? Darwin’s own half-cousin Francis Galton (1822-1911) stands at the beginnings of the statistical ‘solution’ of this problem. One of the many research interests of Galton was the possibility to apply probabilistic models to heredity. It is absolutely clear that Galton was herein enormously influenced by Darwin’s Origin of Species (1859) as well as by the work of Quetelet (Stigler, 1986). However, Galton gave some completely new twists to the directions set out by Darwin and Quetelet. Whereas Darwin had claimed to be “unstatistical by disposition” (Porter, 1986, p. 134) Galton seemed to be statistical by disposition. And whereas Quetelet’s focus of attention had been to the average, Galton was more interested in the deviations from the average – such as genius. It is difficult to understand why statisticians commonly limit their inquiries to Averages, and do not revel in more comprehensive views. Their souls seem as dull to the charm of variety as that of the native of one of our flat English counties, whose retrospect of Switzerland was that, if mountains could be thrown into its lakes, two nuisances would be got rid of at once. (Galton, 1889, p. 62) In 1869 Galton wrote the book Hereditary Genius. This book was groundbreaking in the sense that it was “the first time quantitative data and statistical analysis had been brought to bear on the problem of mental ability” (Wozniak, 1999). However, for Galton Hereditary Genius (1869) formed only the starting point of a problem that would puzzle him for almost two decades (Stigler, 1986), namely: How can it be that on the one hand mental eminence runs in families – suggesting that it is inheritable – whereas on the other hand the children of two geniuses on average do not seem to inherit these exceptional abilities of their parents? …that the offspring did not tend to resemble their parents in size, but always to be more mediocre than they – to be smaller than the parents, if the parents were large; to be larger than the parents, if the parents were small. (Galton, 1886, p. 246) Galton solved this question little by little. His efforts would culminate in 1889 in the book Natural Inheritance (1889). The phenomenon Galton struggled with is now generally known – due to the ideas developed by Galton himself – as ‘regression towards the mean’. It is not restricted to heredity, but appears in practically every “stochastic time-varying phenomenon, where two correlated measurements are taken of the same person or object at two different times” (Stigler, 1997, p. 104; 1999, p. 174). On the common origins of psychology and statistics 91 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? A simple example may clarify this (cf. Stigler, 1997, 1999). Suppose that you have to make two examinations at two successive times. The first time you get an exceptionally high grade. The sad news that ‘regression to the mean’ teaches us, is that on average one may expect that your second grade will be less high. The explanation of this fact is that the extremely high grade on the first test is likely to be due to a combination of “successes in two components, to a high degree of skill (a permanent component) and to a high degree of luck (a transient component) ” (Stigler, 1997, p. 104; 1999, p. 174). Thus, the second time you take the examination your skill is likely to persist, whereas your extreme luck – on average! – will not show up. Yet, an unusually low grade would also tend to regress toward the mean. Figure 34. Graphical illustration of regression made by Galton (1886, p. 249); the circles give the average heights for groups of children whose midparental heights (the average height of both parents) can be read from the line AB. The difference between the line CD and AB represents regression towards mediocrity. Reproduced from Stigler (1986), p. 295. Actually, as long as the scores on the two test are not perfectly correlated (after all, if there would be a correlation of 1.0 between the two exams, i.e., if the value of the grade for the first exam would always lead to an exactly proportional variation in the grade for the second exam, then there would be no regression to the mean) there will be always ‘on average’ a regression ‘towards the average’. Admittedly, the weaker the correlation between On the common origins of psychology and statistics 92 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? the two events, two generations or two exams is, the bigger the effect of ‘regression to the mean’ will be: the higher the amount of non-structural variation in comparison with the structural variation, the more regression to the mean you will see. If properly understood, regression is a concept that is “transparent to the point of being obvious” (Stigler, 1997, p. 103; 1999, p. 173). Yet, it is a source of endless misunderstandings. The most common misinterpretation is that ‘regression to the mean’ would imply that all things in the world are becoming mediocre. However, ‘regression to the mean’ gives no cause for such cultural pessimism. The fact that on average there will be a regression towards the average value, does not mean that heights, mental abilities or whatever other traits are degenerating into grey uniformity. The most spectacular instance of this fatalistic misinterpretation was made in 1933 by a Northwestern University professor named Horace Secrist, who wrote The triumph of mediocrity in business – a book that was embarrassingly enough applauded by most reviewers (Stigler, 1996, 1997, 1999): In over 200 charts and tables, Secrist ‘demonstrated’ what he took to be an important economic phenomenon, one that likely lay at the root of the great depression: a tendency for firms to grow more mediocre over time. (Stigler, 1997, p. 112) The idea that is completely overlooked by such misinterpretations as Secrist’s, is that the brilliancy of Galton’s regression lays in the fact that he distinguished between the structural ‘shared’ variation of certain variables – proportionally going up and down together – and the ‘random’ chance variation – piling up evenly around the average in a bell-shaped way. The real importance of ‘regression’ can be clarified with the help of another important statistical notion developed by Galton: ‘correlation’. In December 1888 Galton wrote a paper for the Royal Society entitled ‘Co-relations and their Measurement Chiefly from Anthropometric Data’ (Galton, 1888), wherein he explained that some traits have on the average, i.e. in the long run, the tendency to vary in the same direction: “tall people tend to have big feet, long arms and long fingers” (Hacking, 2004, p. 187). On the common origins of psychology and statistics 93 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? Figure 35. Francis Galton on one of his own anthropometry cards (1893), with profile and full-face photos and spaces for key body measurements, taken by Alphonse Bertillon Of course, there may be a tall person with very small feet, but in the long run a statistical relation may be observed between feet size and body length. Galton called this relation between the variability of different observations – that can be mathematically expressed – correlation. This means that Galton envisioned that variation can be proportioned into ‘structural’ or ‘systematical’ variation (e.g., on average tall people tend to have big feet) and ‘chance’ variation (e.g., a tall person may have – ‘by chance’ – small feet). Basically the idea of correlation was also underlying the other important notion invented by Galton, regression. After all, if you can discern the ‘chance’ factor in the height of children – leading to regression to the mean – from the structural or correlated variation (tall parents tend to have tall children), one may formulate a linear ‘regression’ formula to predict the variation of one ‘variable’ (e.g., height of a child) from the variation of another ‘variable’ (e.g., height of the parents). However, how should one discern chance variation from structural variation? The answer lies in the conceptual ‘discoveries’ Galton had made already in the 1870s concerning the bell-shaped ‘error curve’. Galton had designed in 1873 a very clear model – called the quincunx – which illustrated the establishment of a bell-shaped curve (see e.g. Stigler, 1986, p. 276 f.f.). This quincunx is a device wherein shot is poured through a regular pattern of pins. Each shot has a probability of fifty percent to fall to either to the left or to right of each pin. The more rows of pins you add, the better the final outline approximates a normal curve. On the common origins of psychology and statistics 94 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? Figure 36. The original Quincunx (left) designed by Galton (1873), a modern replica, and a schematic depiction. In 1875 Galton made another major step in his conceptual understanding of the bellshaped distribution (Stigler, 1986), for he gained the insight that such a curve may consist of a lot of smaller bell-shaped curves – a fact that may explain why the normal curve applies to so many phenomena. Figure 37. Drawings of Karl Pearson (Pearson, 1930, p. 466), based on some hasty sketches of Galton (made in a 12 January 1877 letter to his cousin George Darwin), showing that an accumulation of normal distribution will itself be normal too. After all, a lot of phenomena in this world can be seen as the result of an accumulation of different random, i.e. normally distributed, causes. Galton’s 1875 insight makes it clear that On the common origins of psychology and statistics 95 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? such an accumulation of normally distributed influences will lead to a unified normal distribution. The insights concerning the bell-shaped curve (or the ‘error curve’) struck Galton as epiphanic: I know of scarcely anything so apt to impress the imagination as the wonderful form of cosmic order expressed by the “Law of Frequency of Error.” The law would have been personified by the Greeks and deified, if they had known of it. It reigns with serenity and in complete self-effacement, amidst the wildest confusion. The huger the mob, and the greater the apparent anarchy, the more perfect is its sway. It is the supreme law of Unreason. Whenever a large sample of chaotic elements are taken in hand and marshalled in the order of their magnitude, an unsuspected and most beautiful form of regularity proves to have been latent all along. The tops of the marshalled rows form a flowing curve of invariable proportions ; and each element, as it is sorted into place, finds, as it were, a preordained niche, accurately adapted to fit it. (Galton, 1889, p. 66) From a modern point of view it seems maybe a little exaggerated to assume that the normal curve would have been ‘deified’ by the ancient Greeks ‘if they had known it’. Moreover, as we now know, not all phenomena are distributed according to the normal curve. Yet, the sole fact that (according to the so-called central limit theorem) the sampling distribution, i.e. the distribution of the means of different samples, does converge to a normal distribution (even if a population in itself is not normally distributed!) and the fact that one may assume that unbiased measurement errors will be normally distributed, provided scientists with powerful tools to calculate probabilities to help them to distinguish structural variation from chance variation. [Galton] was regarding the Normal distribution of many traits as an autonomous statistical law. Statistical law had come into the world fully-fledged. Galton saw that chance had been tamed. (Hacking, 2004, p. 186) The statistical methods which Galton had applied in Natural Inheritance (1889) were quickly taken up in the 1890s by disciplines as anthropometry, sociology, economics, psychology and studies of education (Gigerenzer et al., 1990, p. 58). From the 1890s Galton’s student Karl Pearson10 would play a major role in the development of a more ‘autonomous’ (cf. Hacking, 2004, p. 181 f.f.), abstract and 10 See also §2.E “From seventeenth century ‘probability’ to eighteenth century ‘Statistik’ and nineteenth and twentieth century ‘statistical inference’” (p. 54 f.f.). On the common origins of psychology and statistics 96 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? mathematical (cf. Heiser & Verduin, 2005) statistical methodology which could “confront a wide range of scientific and practical problems” (Gigerenzer et al., 1990, p. 59). Moreover, he argued that, instead of the search for causes, it is much more fruitful to ‘accept’ the phenomenon of chance in variation and ‘tame’ it (Hacking, 2004) by the application of statistical methods based on ideas such as the normal curve, correlation and regression. In the tradition of the positivist philosopher Ernst Mach, Karl Pearson rejected the notion of causality altogether as a subjective chimera and a meaningless, metaphysical notion (Desrosières, 1999; Hacking, 2004), for every ‘cause’ is ‘caused’ by other ‘causes’ leading to infinite regress that makes the search for causes a hopelessly subjective and arbitrary enterprise: Cause is scientifically used to denote an antecedent stage in a routine of perceptions. In this sense force as a cause is meaningless. First cause is only limit, permanent or temporary, to knowledge. (Pearson, 1911, p. 150) Hence, Pearson argued that the notion of ‘causation’ had to be completely replaced by ‘correlation’ (see, e.g. Hacking, 2004; Porter, 1986). Figure 38. Karl Pearson So, what is the meaning of the ideas of Quetelet, Darwin, Galton and K. Pearson with respect to the word variation? To summarize, one can see that in less than a century variation had, on the one hand, become more ‘real’ than it ever had been; however, on the other hand, it also had become more domesticated than it ever had been – for the bell-curve, regression and correlation provided the tools to determine how probable it was that certain variation could be ascribed to ‘dumb’ chance. In 1953 R.A. Fisher would even say: On the common origins of psychology and statistics 97 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? The effects of chance are the most accurately calculable, and therefore the least doubtful, of all the factors of an evolutionary situation. (Fisher, 1953, p. 515) The taming of variation – at least the taming of the chance factor in variation – had important practical implications, namely that it accelerated the selection amongst hypotheses. (§2) Selection in evolution and statistics. Among the ‘variations’ that emerge in replication there will be almost always a struggle for existence. In biology variants of replicators – for instance peas or finches – have to struggle in order to gain access to limited resources such as food and space, so that they will survive long enough to reproduce themselves. In science ‘memes’, like hypotheses and theories, struggle for existence too: in a fight of life-and-dead against oblivion, the waste-paper basket or falsification11. Why does one finch survive and why does the other perish? Why is one hypothesis rejected and the other not? The interesting fact is that selection occurs according to a ‘mindless’ algorithm (Dennett, 1996): evolution generates rational designs – such as for instance a complex organ as the human eye – on the base of mere variation and selection. The most interesting and powerful algorithms – in computers as well as in natural evolution – finesse ignorance and produce rational results with this blind and mindless tactic, viz. by “randomly generating a candidate and then testing it out mechanically” (Dennett, 1996, p. 53). Yet, how can a mindless algorithm produce rational designs? Dennett shows how a very basic algorithm – such as a tennis tournament – can generate rational results: a amateur may have luck and win a set, but in the long run the chances rise that the most talented and professional tennis players will float to the surface. The tennis tournament is a simple algorithm that “takes as input a set of competitors and guarantees to terminate by identifying a single winner”. Of course, to win a tournament is a combination of skill and luck. Still, in the long run “more of the better players would tend, statistically, to get to the late rounds” (Dennett, 1996, p. 55). Even if a tournament is very luck-ridden – for instance if the tennis players were required “to play Russian roulette with a loaded revolver before continuing after the first set” (Dennett, 11 See also above, §3.C “Statistics à la Popper: natural selection of a falsifying rule for statistical hypotheses” (p. 78-81). On the common origins of psychology and statistics 98 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? 1996, p. 55) – the better players still would have better chances to make it to the final rounds than the untalented amateurs: The power of a tournament to “discriminate” skill differences in the long run may be diminished by haphazard catastrophe, but is not in general reduced to zero. This fact […] is as true of evolutionary algorithms in nature as of elimination tournaments in sports, […]. (Dennett, 1996, p. 55) However, it may take quite some ‘deep time’, i.e., sometimes even billion of years (see e.g. Gee, 1999), before natural selection selects chance from structural variation. Statistical methodology – which can make pretty good guesses about what is due to chance variation and what to structural variation, based on probabilistic calculations – may accelerate this process. Boris Becker Boris Becker Dan Dennett (write winner’s name above) George Smith Pete Sampras Pete Sampras Figure 39. An example of an simple algorithm: the tennis tournament. In the long run rational results will be generated (figure reproduced from Dennett, 1996, p. 53). Probably it sounds quite remarkable that humans are capable of ‘speeding up’ natural selection. Yet, think for instance about dogs. All dogs belong to the same species (Canis lupus familiaris) – all dog breeds share the same genome – that has emerged approximately 15,000 years ago out of wolves: they are a rather ‘young’ species compared to for instance rats and mice who exist approximately 750,000 year12. However, the really interesting fact about domestic dogs is that it is a species that was created by selective human breeding and that this ‘artificial selection’ apparently has been so powerful that the time it took to develop from wolf to dog is much shorter than one would expect to see in ‘normal’ natural selection – even though the two processes operate on the same underlying gene pool. 12 These and other petty facts about genetics can be found at a nice website, where everybody can ask his questions to geneticists from renowned universities: http://www.thetech.org/genetics/asklist.php On the common origins of psychology and statistics 99 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? In the same way the ‘taming’ of chance has led to an accelerated selection of hypotheses. hypothesis 1 hypothesis 2 winning hypothesis hypothesis 3 hypothesis 4 Figure 40. The selection of hypotheses by a ‘mindless’ algorithm (figure inspired by Dennett, 1996, p. 53). It now becomes clear that scientists are not – as they may like to think of themselves – the ‘inventors’ of the ‘scientific game’ and its rules, nor does it matter which hypothesis they believe will win the scientific game: their only function is to implement the algorithm (e.g., to ‘shoot’ the losers and bury them), reduce the influence of unwarranted biases (e.g. level ‘bumpy courts’ which could raise the luck ratio in a completely unwarranted way), and ‘tame’ chance by making good guesses – on the base of their probabilistic calculations which draw a line at a certain level (e.g., p < 0.05 or p < 0.01) between what has to be considered as chance and what as systematic variation – in order to accelerate the eliminating algorithm amongst the scientific hypotheses. Thus, viewed from this perspective, scientists are the ‘umpires’ and ‘ball boys’ in the scientific elimination tournament among hypotheses. So, a good umpire and a good scientist are both characterized by the fact that they are unbiased – i.e. that their personal beliefs do not influence their algorithmic application of the rules. Both have to be ‘blind’ in their judgements. Their blindness guarantees a ‘fair’ (i.e. a ‘rational’) result: namely, that it is really the best that will win. In science this idea is, for instance, expressed by the fact that experiments should have a blind or even double-blind design. Because ‘blind’ chance can be domesticated, the scientist has to be ‘blind’ or ‘unbiased’. In a sense both the umpire and the scientist stand in a long tradition of two other On the common origins of psychology and statistics 100 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? instances that are renown for the fact that they apply their (algorithmic) rules blindly, namely Justice (‘Justitia’) and Chance (‘Fortuna’). Figure 41. Both Justice (left) and Fortune (right) have been traditionally depicted blindfolded: both apply their rules ‘blindly’. (The oil painting of lady Justice is dated 1804, made by J.H. Fredriks, and belongs to the collection of the municipality of Breda in the Netherlands; the miniature of lady Fortuna is a miniature from a manuscript of Augustine’s La Cité de Dieu, made around 1400-1410, which belongs to the collection of the Dutch Royal Library.) Yet, although both the scientific rules as well as the tennis rules are applied ‘blindly’, the algorithmic rules of a game of tennis are of course not the same as those of the ‘game’ of science. For instance, science uses significance levels as ‘p < 0.05’ or ‘p < 0.01’, whereas tennis does not. After all, a tennis tournament is not a scientific way to determine the best tennis player. It deals with individual tennis players, whereas the scientist would probably more interested in the structural variations between different groups of tennis players (e.g., ‘is there any structural difference between female and male tennis players?’). To summarize, the algorithmic selection of hypotheses in science is accelerated by the statistical approach. However, the statistical methodology deals only with ‘collectives’ or ‘groups’. This concept is apparently rather hard to grasp to human mind: some examples about our attitudes towards psychological research may clarify this. Statistical-psychological research is able to say rational things about human rationality or irrationality because it does not deal with the cognition of one single individual but with whole collectives. On the one hand we have grown used to this idea: for instance, the attempts of Oswald Külpe (1862-1915) to ‘research’ psychological laws by introspection of one single person (mostly the researcher himself) now seem ‘cute’ curiosities from the history of psychology On the common origins of psychology and statistics 101 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? that cannot be taken completely serious. To give an impression of introspective rapports I will cite here a short fragment from an introspective rapport that was made by Külpe’s assistant Karl Bühler in 1907, wherein Külpe himself is the subject and has to tell the direction of his thoughts after he has been asked the rather sophisticated question ‘Can we capture the nature of thinking by our thoughts?’: The question at first struck me as odd; I thought it might be a trick question. Then it suddenly occurred to me how Hegel had criticized Kant, and then I answered decisively: Yes. The thought about Hegel’s critique was quite rich, I knew at the moment exactly what I amounted to, I didn’t say anything about it, and also didn’t imagine anything, only the word “Hegel” resounded to me subsequently (acousticmotoric)… (Gigerenzer & Murray, 1987, p. 139) As experimental psychology progressed, its research estranged more and more from such accounts of individual observations and turned into research wherein the “anonymous members had no individual existence in the experimental report” (Danziger, 1990, p. 100). We have grown so used to this kind of research that Kühler’s introspection strikes us as peculiar. Yet, on the other hand lots of people still experience some ‘unease’ when they are confronted with the fact that they are ‘just’ one subject in a psychological research and that their particularities are of no interest to the researcher: every “extraexperimental” identity is annihilated in the common denominator of being a “subject” in a research (Danziger, 1990, p. 99). The subject of the next section will be this ‘unease’ we experience when we are confronted with the ‘bleak’ Darwinist depiction of science as a process which encompasses, the taming of chance (i.e., the discrimination between chance and structural variation) by the ‘blind’ application of algorithmic ‘selection’ rules to replicative, collective phenomena. On the common origins of psychology and statistics 102 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? [3.E] The lure of ‘subjective’ semantics in statistical inference. In the previous three sections, I showed the evolutionary epistemology that underlies the frequentist statistical methodology, such as it is applied in psychology and a lot of other sciences. In this evolutionary epistemology is no place for an autonomous, individual subject. Nevertheless, most people – including psychologists and psychology students – think they are autonomous, individual subjects and subsequently they assume that their individual beliefs, ideas and experiences do matter scientifically. I think that this discrepancy may explain why statistical methodology is so poorly understood: both scientists and students in psychology are ‘lured’ to a misplaced subjective interpretation of their frequentist methodology. This is very nicely exemplified by a study of Haller and Krauss (Gigerenzer et al., 2004; 2002), who in the year 2000 confronted 44 students of psychology, 39 professors and lecturers of psychology, and 30 statistics teachers from six German universities with a short questionnaire about the meaning of a significant p-value in a t-test. The questionnaire, which was developed by Oakes (1986) in a similar study, is presented in figure 42. The Questionnaire Suppose you have a treatment that you suspect may alter performance on a certain task. You compare the means of your control and experimental groups (say 20 subjects in each sample). Further, suppose you use a simple independent means t-test and your result is (t = 2.7, d.f. = 18, p = 0.01). Please mark each of the statements below as “true” or “false”. “False” means that the statement does not follow logically from the above premises. Also note that several or none of the statements may be correct. 1) You have absolutely disproved the null hypothesis (that is, there is no difference between the population means). [ ] true / false [ ] 2) You have found the probability of the null hypothesis being true. [ ] true / false [ ] 3) You have absolutely proved your experimental hypothesis (that there is a difference between the population means). [ ] true / false [ ] 4) You can deduce the probability of the experimental hypothesis being true. [ ] true / false [ ] 5) You know, if you decide to reject the null hypothesis, the probability that you are making the wrong decision. [ ] true / false [ ] 6) You have a reliable experimental finding in the sense that if, hypothetically, the experiment were repeated a great number of times, you would obtain a significant result on 99% of occasions. [ ] true / false [ ] Figure 42. Questionnaire from the Haller & Krauss study (Gigerenzer et al., 2004, pp. 392-93; Haller & Krauss, 2002, p. 5) On the common origins of psychology and statistics 103 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? All the statements are in fact wrong: they all represent common misconceptions about the meaning of a significant result in a significance test. However, the majority of the subjects who answered this questionnaire endorsed one or more of these illusions 100% 80% 90% 100% 97% 0% Professors & lecturers teaching statistics (N=30) Professors & lecturers not teaching statistics (N=39) Psychology students (N=44) Professors & lecturers from the Oakes study (1986) (N= 68) Figure 43. The amount of delusions about the meaning of “p = .01”. Percentages of participants in each group who made at least one mistake in the questionnaire, in comparison to Oakes’ original study (1986). Based on Haller & Kraus. (Gigerenzer et al., 2004, p. 394; 2002, p. 7) So, there are quite some sufferers from misconceptions within the department of psychology! On a closer look one may conclude that a large part of this misconceptions is based on faulty subjective interpretations of frequentist methodology. Clearly, a lot of the subjects have not understood that “p = .01” only means that – under the null hypothesis – the probability of the test statistic being at least as large as the one calculated from the data is 0.01. It is in frequentist statistical methodology absolutely impossible that we could assign a probability to the hypothesis: the hypothesis can only be completely false or completely true. Thus, the meaning of “p = .01” is just that either the null hypothesis is true, in which case something unusual happened by chance (probability 1%), or the null hypothesis is false (cf. Hacking, 2001, p. 215). Subjective, in particular Bayesian, statistical inferential methodology allows to assign probabilities to hypotheses, but frequentist statistics does not! With these remarks in mind we can now take a closer look to the false statements of the questionnaire (cf. Gigerenzer et al., 2004; Haller & Krauss, 2002): Statements 1 and 3 As Popper showed clearly, hypotheses can only be (‘practically’) falsified, but never proved or disproved! Hypotheses that survive a lot of severe tests may be called On the common origins of psychology and statistics 104 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? corroborated – however, corroboration has nothing to do with probability calculus or statistics. Therefore statements 1 and 3 are both false. Statements 2, 4, and 5 Statement 2, 4 and 5 all claim that the p-value assigns a probability to the hypothesis, which is a clear subjectivist delusion. As was already clarified by Hume in his socalled ‘induction problem’, the fact that we have seen the sun rise thousands of times gives no rational guarantee whatsoever that the sun will rise tomorrow too. One does not have any rational ground to assign a probability to the hypothesis that the sun will rise tomorrow and subsequently it is of course also impossible to say that this probability is ‘true’ or ‘wrong’. Thus, it is clear that statement 2 and 4 are false. Statement 5 makes essentially the same claim as statement 2 does – ‘the probability that you make a wrong decision’ being a reformulation of ‘the probability that the null hypothesis is true’ – and consequently is false too. In fact one may say that statement 1, 2, 3, 4, and 5 all suffer basically from the same delusion, namely that they assume that one can assign some probability to the hypothesis – however, the whole idea of assigning any probability whatsoever to a hypothesis is nonsensical in frequentist statistics! Statement 6 Recall that the “p = .01” just means that – if one assumes that the null hypothesis is true – the probability that the test statistic turns out as it did is only 1%. So, one concludes that either the null hypothesis is true, in which case something unusual happened by chance (probability 1%), or that the null hypothesis is false. Although statement 6 is the only statement of the questionnaire that rightly says that “p = .01” concern the probability of the data, it nevertheless overlooks the fact that “p = .01” only says something about the probability of the data, given the assumption that the null hypothesis is true! Statement 6 pretends as if that “p = .01” says something about the probability of the data per se, instead of ‘given the hypothesis’. Yet, statement 6 would only be true if one would know with absolute certainty that the null hypothesis is true – however, one does only assume the null hypothesis to be true. To sum up, all the six incorrect statements suffer from ‘subjective’ illusions. All these illusions endorse the idea that we ‘really’ can know and that our knowledge ‘really’ may grow On the common origins of psychology and statistics 105 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? – in spite of Popper’s thought which showed that the growth of knowledge in fact is only a diminishment of error. In table 1 the exact percentages of false answers are showed. On average 2.5 illusions were endorsed by students, 2.0 illusions by their professors and lecturers not teaching statistics, and 1.9 illusions by professors and lecturers teaching statistics. Table 1 Percentages of false answers (i.e., statements marked as true). Germany 2000 Psychological departments of German universities (Haller & Krauss, 2002) United Kingdom 1986 (Oakes, 1986) Statements (abbreviated) Psychology Students (N = 44) Professors and Lecturers: Not Teaching Statistics (N = 39) Professors and Lecturers: Teaching Statistics (N = 30) Professors and Lecturers (N = 68) 1) H0 is absolutely disproved 34% 15% 10% 1% 2) Probability of H0 is found 32% 26% 17% 36% 3) H1 is absolutely proved 20% 13% 10% 6% 4) Probability of H1 is found 59% 33% 33% 66% 5) Probability of wrong decision 68% 67% 73 % 86% 6) Probability of replication 41% 49% 37% 60% Note. Percentages of false answers (i.e., statements marked as true) in the three groups studied by Haller & Krauss (2002), in comparison to the percentages of false answers among the academic psychologist in Oakes’ original study (1986). Based on Haller and Krauss (Gigerenzer et al., 2004; Haller & Krauss, 2002). Statement 1 and 3 were most frequently identified as being incorrect, probably because the subjects understood that the word ‘absolutely’ is misplaced – statistical methodology will never give absolute certainty. However, in view of the rather high percentages of subjects who considered statements 2, 4 and 5 to be correct, one may conclude that the subjective misconception, that “p = .01” says something about the amount of belief (i.e. ‘probability’) one has to assign to a hypothesis, is apparently very widespread. Is it surprising that so many students, lecturers and professors of psychology endorse ‘subjective’ illusions? Well, on the on hand it seems to be surprising, that a methodology that is so central to psychology is conceptually poorly understood. However, on the other hand, it On the common origins of psychology and statistics 106 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? is not surprising at all: as we saw earlier13, even R.A. Fisher – whose ideas formed the base for the statistical methodology such as it is applied in psychology – could not got rid of the last shreds of subjectivism in his frequentist methodology and spoke of vague subjective notions as ‘fiducial probability’ and ‘likelihood’ (Hacking, 2001, p. 245; Kendall, 1963). The ‘unease’ we experience when we are confronted with the ‘bleak’ Darwinist depiction of science’s blind and mindless application of algorithmic selection to replicative phenomena was apparently also shared by Fisher when he spoke about the ‘free mind’ and ‘a language distinct from that of technological efficiency’ (Fisher, 1955, p. 70). If you only pay some attention to it, you may see how widespread the allurement of subjective semantics is. Mayo (1996, p. 364, footnote 1), for instance, even contends – and I think she is right – that the subjective semantics that were indigestible to Egon Pearson and Jerzy Neyman, not only caused the fundamental controversies between them and Fisher, but also underlie – at least partly – the antagonisms between Karl Pearson and his son Egon. However, although Fisher was a frequentist with a subjectivist twist, he strongly opposed outright Bayesianism. Yet, today it seems to be in vogue to defend Bayesianism as an apt statistical methodology for psychology – not just as an alternative, next to frequentist statistical methodology, but maybe even as a methodology that may partly replace frequentist statistics. In a very recent dissertation Romeijn (2005) argues that conventional frequentist methodology may be used next to Bayesian statistics as an “epistemic shortcut” (p. 255). It is evident that Romeijn – in a way just as Fisher, whom he depicts as a stubborn frequentist – dreams of scientists with ‘free minds’: The proposed scheme can be used here to formalise a view that finds its roots already in Kant, […]. It is the view that knowledge can only emerge on the intersection of observation, presented by a mindindependent world, and a conceptual framework, devised by, partly world-independent, minds [italics added]. (Romeijn, 2005, p. 12) This is evidently in complete opposition with the Popper’s (i.e. ‘evolutionary’ or ‘Darwinist’) epistemology! After all, we ourselves and our knowledge are a result of a long evolutionary process of adaptation to our environment through variation and selection – thus, the idea that our minds could be world-independent and make objective observations is a “colossal mistake” (Popper, 1990, p. 37). Of course, the methodology proposed by Romeijn – who adheres strongly to ideas of Bayes and Carnap – is much more ‘cheery’ than that of Popper. It 13 See also §2.E “From seventeenth century ‘probability’ to eighteenth century ‘Statistik’ and nineteenth and twentieth century ‘statistical inference” (p. 60 f.f.). On the common origins of psychology and statistics 107 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? is much nicer to think of yourself as a individual, making observations and inductions, choosing freely you ‘input probabilities’, etc.: Finally, it is again notable that apart from the observations, the Bayesian scheme consists of a range of input probabilities which are entirely free for choice. There is no further restriction on what input probabilities may be rational or acceptable. […] It is that both in the Carnapian and in the Bayesian scheme, the observations do not determine what predictions are warranted. In choosing the input probabilities we effectively determine the patterns in the observations on which the predictions focus, but there is no restriction stemming from the observations alone. (Romeijn, 2005, p. 39) Why is someone like Romeijn – working in a psychology department and lecturing on philosophy of statistics – so attracted to Bayesianism? Apart from the fact that Bayesianism is based on a much more ‘cheery’ – though unfortunately philosophically untenable – epistemology, than the ‘bleak’ Darwinist epistemology underlying frequentist statistics, there are some reasons why Bayesianism especially interests psychologists: for on the level of psychological theories of cognition Bayesianist theories are really ‘hot’. One of the favourite ‘models’ or ‘metaphors’ of human cognition is already for several decades the model of the intuitive Bayesian statistician. Why the Bayesian model is so popular in psychological cognitive theories will be the subject of the next section (F). However, for now I will end this section with the suggestion that – besides the earlier mentioned reasons – the relative popularity of Bayesianism among some methodologists in the statistical departments of psychology may be due to a ‘contamination’ from the Bayesianist enthusiasm on the level of psychological theory. On the common origins of psychology and statistics 108 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? [3.F] The lure of ‘subjective’ semantics in cognitive psychology. Before I started to study psychology I used to believe in the old Aristotelian idea (Aristotle, 1995, 1.1253a ) that man is a animal endowed with reason: a ‘zôion logon echon’ or – in Latin – an ‘animal rationale’ (see also Heidegger, 1988, p. 21-28). Yet, one has only to open an introductory course on human reasoning and thinking (see e.g. Garnham & Oakhill, 2001) to conclude that during the last decades it has become generally accepted in psychology that man is a ‘cognitive miser’ at the utmost endowed with a ‘bounded rationality’ which sometimes even seems to be completely ‘irrational’ – cognitive illusions, heuristics and biases have become “the fodder for classroom demonstrations and textbooks” (Gigerenzer, 2000, p. 237). So, if man is not the seat of rationality, what is? Well, as we saw in the previous section on the Darwinist epistemology underlying frequentist statistical inference, rationality is everywhere around us! After all, rationality is the rational design which the algorithm of natural selection selects in the long run: the human eye, the finch, the hypothesis that was not eliminated, etc. Thus, we may conclude that frequentist statistical inference is a rational method – for its ‘blind’ or ‘mindless’ application of selecting algorithms to replicative, collective phenomena generates rational results. At the same time this leads us to the conclusion that Bayesian statistical inference – a conclusion made basically already by Hume when he formulated his ‘induction problem’ – is not rational. Subsequently it is clear why Bayesian statistics – despite the wishful hopes of methodologists like Romeijn (2005) – have hardly any place in scientific methodology. Paradoxically however, in cognitive psychology the rationality of human cognition is evaluated against Bayesian statistics: human cognition would be considered rational if its reasoning would follow the rules of Bayesian statistics – which it does not! Since the famous psychologists Kahneman and Tversky (1982) launched in the 1970s their “heuristics-andbiases” movement, cognitive science has generally adopted the idea – repeated in textbooks again and again – that the human cognition is fundamentally deficient in its rationality, because it fails to be a flawless Bayesian ‘intuitive statistician’ (Gigerenzer, 2000; Gigerenzer & Murray, 1987; Gigerenzer et al., 1990). This is quite a remarkable situation: after all, the subjective (i.e. Bayesian) interpretation of probability that is mostly deemed to be too ‘suspicious’ to apply in scientific methodology, is used on a theoretical level in cognitive psychology as the standard for measuring the rationality of the human mind – only to conclude that the human mind is rather On the common origins of psychology and statistics 109 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? irrational. The ambiguous attitude to Bayesian statistics of cognitive psychologists is clearly expressed by Gigerenzer: Why do psychologists of the calibre of Kahnemann and Tversky nevertheless adhere to the idea that Bayes’ theorem is rationality for all contents and contexts? In most studies that claim to have demonstrated human errors, biases, and shortcomings, no argument is given to explain why the statistical rule applied is rational, nor is rationality independently defined. (Gigerenzer & Murray, 1987, p. 179) I. Bayesian inference and Artificial Intelligence One may indeed wonder why so many cognitive psychologists have embraced Bayesian statistical inference as the one and only standard of rationality. The answer lies in the fact that especially since the 1980s Bayesian inference has scored impressive successes in Artifical Intelligence: computer simulations of artificial ‘neural’ or ‘connectionist’ networks – which can weight ‘evidence’ according to Bayesian inference – have showed to be capable of discerning patterns and dealing with learning tasks (see e.g. McClelland, 1994). “New technologies have been a steady source of metaphors of the mind” (Gigerenzer, 2000, p. 21) and the “surprising things” (Dennett, 1991) connectionist models seem to be capable of has made it of course a very attractive model of cognition. So, why is Bayesian inference working so well in artificial connectionist networks? To answer this question we must clarify first what a connectionist network is. Figure 44. A simple 'neural' or 'connectionist' network. (Reproduced from Garson, 2002) Connectionist networks are models consisting of large numbers of units (seen by connectionist psychologists as analogous to neurons in the human brain), whose level of activation is regulated by the activation level of other units to which it is connected (Wheeler, On the common origins of psychology and statistics 110 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? 2005). The effect of one unit on another can be excitatory or inhibitory. The connections with which the units are connected with each other can be of varying ‘strength’ or ‘weight’. The ‘weight’, i.e. the ‘strength’, of a connection can be adapted by experience: the more dogs you have seen, the more like you are to recognize a stimulus pattern as a dog. Computer simulations of such connectionist models “have demonstrated an ability to learn such skills as face recognition, reading, and the detection of simple grammatical structure” (Garson, 2002). How your ‘experience’ may influence your recognition may be exemplified by an anecdote told by Dennett: The philosopher Samuel Alexander, was hard of hearing in his old age, and used an ear trumpet. One day a colleague came up to him in the common room at Manchester University, and attempted to introduce a visiting American philosopher to him. "THIS IS PROFESSOR JONES, FROM AMERICA!" he bellowed into the ear trumpet. "Yes, Yes, Jones, from America" echoed Alexander, smiling. "HE'S A PROFESSOR OF BUSINESS ETHICS!" continued the colleague. "What?" replied Alexander. "BUSINESS ETHICS!" "What? Professor of what?" "PROFESSOR OF BUSINESS ETHICS!" Alexander shook his head and gave up: "Sorry. I can't get it. Sounds just like 'business ethics'! (Dennett, 1998, p. 250) There is no doubt that experience may modify our expectations: for instance I expect the sun to rise tomorrow, because as long as I live the sun has risen every day. This modification of expectations according to experience is most of the time very practical, adaptive behaviour. The expectation that the sun will rise tomorrow makes that I set my alarm clock. Moreover, I find it also very pleasant that my dog starts wagging his tail every day around 7 p.m., because he has ‘learned’ from experience that is when I return home from work (cf. Popper, 1990, p. 30). It is the same capacity that will make robots so attractive in the future. The Sony entertainment robot dog ‘Aibo’ is nice, because he may learn to wag his tail around 7 p.m.: just like a real dog. Figure 45. Sony’s legged entertainment robot 'Aibo' and a real dog. On the common origins of psychology and statistics 111 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? Thus, it is absolutely evident that Bayesian inference can be very useful in artificial intelligence. Yet, the fact that some technique is practical and useful does not has to lead automatically to becoming a standard for ‘human rationality’: for instance the invention of photography also has lead to very practical and useful applications, but nobody has considered to make it the standard against which the ‘rationality’ of human observation should be measured. So one may ask why Bayesian inference should be considered the standard of rationality. I think the only reason is the allurement of its subjective semantics – Bayesian inference entails the comforting idea that an individual knowing subject may have rational beliefs, i.e. beliefs governed by the laws of probability. It is the reintroduction of classical, eighteenth century, epistemology of probability theory as the “formal description of the intuitions of a prototypical reasonable man” ( cf. Gigerenzer, 2000, p. 266; Gigerenzer et al., 1990, p. 226). It is its Cartesian talk about representations, beliefs and subject-object dichotomies (see also the analysis of connectionist thought in the same vain: Wheeler, 2005) which coaxes us to call Bayesian inference rational. I will substantiate this hypothesis by taking a closer look in the following subsection to the research of Tversky and Kahneman (1982). II. Is human mind a Bayesian? The research of Tversky and Kahneman. That experience modifies our expectations is evident (cf. Popper, 1990). Yet, as soon as we start mingling probability calculus into this obvious statement, things become more difficult. Bayes’ theorem gives the probabilistic rules how to adjust or revise in a rational way our beliefs in light of new evidence. If E and H are two events, then p(E|H) is the probability of observing E given the fact that event H has occurred and p(H|E) is the probability of observing H given the fact that event E has occurred. Now assume that E is an event which you have observed and that Hi (which is one of the n possible and mutually exclusive causes H1,….Hn) is a hypothetical cause of the observed event. This sounds quite abstract, so I will clarify this with an example (cf. Amossé, Andrieux, & Muller, 2001). Suppose that you are worried that you might have the disease ‘Bayesomia’. You go to the hospital to get tested. Unfortunately you get a positive result. This positive testing result we will call event E. Though your test result was positive, you still have some hope that it was a false positive result, because you know that the testing methods for ‘Bayesomia’ are accurate only 99 percent of the time (regardless of whether the results come back positive or negative). Moreover, your physician has told you that the disease ‘Bayesomia’ is present in one of every 1,000 people. So the event E may be caused by the disease – this is hypothesis Hi. However, On the common origins of psychology and statistics 112 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? event E may also be caused by something else, i.e. not by the disease: this hypothesis we call ‘~Hi’. About hypothesis Hi the physician gave us the ‘a priori’ knowledge that its occurrence in the population is 0.001. Thus if you had been asked before the test what the probability was that you had been struck by ‘Bayesomia’ your ‘rational’ answer should be 0.001. However, the positive result of the test is new evidence which puts things in a whole different light. What should you answer if you were asked to estimate the probability that you suffer from ‘Bayesomia’ after you was informed about your positive test result? Bayes’ theorem gives you the answer: p(Hi|E) = P(E|Hi)*p(Hi) p(E|Hi)*p(Hi) + p(E|~Hi)*p(~Hi) Bayes’ theorem shows you how to evaluate your a posteriori probabilistic knowledge about the occurrence of the disease – p(Hi|E) – in the light of the probability of observing the event E if it is caused by ‘Bayesomia’ – p(E|Hi). Subsequently you have to integrate this with the probability that you are not a sufferer from ‘Bayesomia’, i.e. that you are one of the 999 healthy people and the probability of 0.01 that the test was a false positive. p(Hi|E) = 0.99 * 0.001 [0.99 * 0.001] +[0.01 * 0.999] So, from Bayes theorem you have to conclude that the a posteriori probability (after you got your positive result) that you indeed have contracted ‘Bayesomia’ is 0.09. However, from the famous studies conducted by Kahneman and Tversky in 1973 (‘Engineer-Lawyer Problem’) and 1980 (‘Cab Problem’) seemed to follow that the human mind is very poor is performing this Bayesian calculation intuitively (Amossé et al., 2001): it would be completely in line with the findings of Kahneman and Tversky if you – after hearing your positive test result – would ‘irrationaly’ believe that it was almost absolutely certain that you contracted ‘Bayesomia’ instead of drawing the ‘rational’ conclusion that the probability you suffer from this disease has become now 9%. After all, the results of the Kahneman and Tversky studies indicated that the subjects tended to neglect the a priori knowledge, i.e. that there is a base rate neglect. So, for instance, in the so-called ‘Cab Problem’ study Tversky and Kahneman (1980) presented their subjects with the following problem: On the common origins of psychology and statistics 113 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? A cab was involved in a hit-and-run accident at night. Two cab companies, the Green and the Blue, operate in the city. You are given the following data: (i) 85% of the cabs in the city are Green and 15% are Blue. (ii) A witness identified the cab as a Blue cab. The court tested his ability to identify cabs under the appropriate visibility conditions. When presented with a sample of cabs (half of which were Blue and half of which were Green) the witness made correct identifications in 80% of the cases and erred in 20% of the cases. Question: What is the probability that the cab involved in the accident was Blue rather than Green? (Tversky & Kahneman, 1980, p. 162) You may notice that this problem is exactly analogous to my ‘Bayesomia’ example: only the Hi is now a Blue cab (probability 0.15), whereas the event E is the identification of the cab by the witness as a Blue cab. The probability that the witness has made a correct identification is 0.8. p(Hi|E) = 0.8 * 0.15 [0.8 * 0.15] +[0.2 * 0.85] Thus, the a posteriori probability ( after the identification by the witness) that the cab involved in the accident was Blue rather than Green is – according to Bayes’ Theorem – 41%. Yet, Tversky and Kahnemann (Tversky & Kahneman, 1980) report that most of the subjects gave probabilities around 80%. According to Kahneman and Tversky this indicates that the human mind acts ‘irrationally’ – at least non Bayesian – because it neglects the base rates it should have taken into account according to Bayes’ theorem. The research of Kahneman and Tversky lead to an explosion of research which showed how ‘bounded’ human rationality is. The proponents of this “heuristics-and-biases” program (Gigerenzer & Murray, 1987) have tried to show again and again how the laws of cognition seem to be at odds with the laws of probability. However, the interpretation of the results of the studies of Kahneman and Tversky is not uncontested. Gigerenzer and Murray have shown in their work Cognition as Intuitive Statistics (1987) that the probabilistic, formal way wherein Kahneman and Tversky presented their problems may be the actual cause of the ‘biases’ in the estimates made by the subjects. Gigerenzer (2000) showed that if the same problem is presented in a different way (formulated in ‘natural frequencies’ instead of abstract probabilities, more contextualized, etc.) the ‘biases’ and ‘neglect’ largely disappear. Moreover, the assumption, which apparently On the common origins of psychology and statistics 114 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? underlies the heuristics-and-biases program, that the application of Bayes’ theorem is the only correct solution of their problem is highly arguable. For example, application of the statistical theory of Neyman and Egon Pearson to the ‘Cab Problem’ leads to a completely different answer. Actually, following from a Neyman and Pearsonian analysis, the probability that the cab really was Blue – given that the witness said it was Blue – is 0.82 (Gigerenzer & Murray, 1987), which happens to be very close to the answer the majority of the subjects in the study of Kahneman and Tversky gave. III. Can you bear to see psychology as it is? Of course, Bayesian statistics is a very practical tool in Artificial Intelligence. Yet, psychology seems to have found in the successes of Bayesian reasoning in AI, a pretext to backslide into philosophically outdated subjective semantics. Darwinist or ‘evolutionary’ epistemology has shown that objective rational knowledge is not seated in the individual subject – rationality is not to be found in my believes or in my experiences, but only replicative, collective phenomena which adapt to their environment through the algorithm of natural selection. Contemporary experimental psychology with its frequentist statistical methodology – consisting of a mix of the Fisherian and Neyman-Pearsonian ideas – corresponds to such a Darwinist epistemology. In 1957 the famous statistician Lee Cronbach could still talk of “two disciplines of scientific psychology”, whereby he defined Fisherian experimental psychology as a “Tight Little Island” in comparison to “Holy Roman Island” (Cronbach, 1957, p. 671) of correlational psychology, which stood in the tradition of Galton and Karl Pearson, of the study of intelligence and personality and whose “purpose was to find a measurement instrument […] for an “objective” registration of individual differences” (Gigerenzer, 1987b, p. 60). Since 1957 correlational psychology has lost more and more terrain in favour of the onrushing Fisherian experimental psychology. Even disciplines such as social or clinical psychology that used to employ mostly correlational psychology, have changed their methodological taste very strongly into an experimental one. In psychology the victory of the experimental Fisherian method over every other method is undeniable. The enormous ‘force’ or ‘success’ of experimental psychology lies in its statistical methodology that accelerates this ‘adaptation’ or ‘attunement’ between us and our environment, because it can make good guesses about what is structural variation and what chance variation. In this way psychology is able to attune human intelligence and its environment to each other in such a way that error is minimized. The ‘fruits’ of the On the common origins of psychology and statistics 115 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? psychological science are its answers to questions like: ‘How should we attune the employee and his workspace in such a way that his production is maximized?’, ‘How to attune the pilot and the control panel in the plane in such a way that the chance that he will push a wrong button is minimized?’, ‘How to attune the patient suffering from depression and his treatment in an optimal way?’ or ‘How to optimize the attunement of the Artificial Intelligence of a robotic entertainment dog – like the Aibo – to the emotional needs the senior citizen?’. Yet, the employee, the pilot, the patient suffering from depression and the senior citizen are not particular individuals, but averages. Thus the average pilot, the average employee, the average patient suffering from depression and the average senior citizen are the environment which ‘selects’ control panels, workspaces and anti-depressive treatments. Psychological research makes that control panels, workspaces and anti-depressive treatments become more rationalised, i.e. better adapted or attuned to average human intelligence. Psychology is the enterprise wherein hypotheses and models are subject to a process of selection, leading to an increasing attunement – i.e. ‘rationalisation’ – of data and psychological hypotheses, models and theories. In his address of the American Psychological Association in 1957 Lee Cronbach describes the difference between the correlational approach and the applied experimental as follows: The program of applied experimental psychology is to modify treatments so as to obtain the highest average performance when all persons are treated alike – a search, that is, for "the one best way." The program of applied correlational psychology is to raise average performance by treating persons differently – different job assignments, different therapies, different disciplinary methods. […] If the engineering psychologist succeeds: information rates will be so reduced that the most laggard of us can keep up, visual displays will be so enlarged that the most myopic can see them, automatic feedback will prevent the most accident-prone from spoiling the work or his fingers. (Cronbach, 1957, p. 677) Although psychological testing is a booming in all sorts of educational and professional assessments, scientific psychology itself has become more and more experimental – searching for “the one best way” in continually accelerating cycles of “problem P1 tentative theoretical solution evaluative error elimination problem P2” (Popper, 1979, p. 119 f.f. and p. 149). I once talked with a student in industrial design about the fact that industrial design students have to study a lot of statistics to be able to make calculations like: ‘What size should a chair have to fit to 95% of the population?’. Could you say that a chair that fits 95% of the population is more rational than a chair that fits only 35% of the population? Yes, of course! On the common origins of psychology and statistics 116 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? However, not only are control panels and robotic entertainment dogs attuned to us, but we ourselves are also the result of a long process of variation and selection that attuned us to our environment. All our knowledge is an adaptation to our environment (cf. Popper, 1990). I therefore agree with Popper, who argued that psychology suffers from the “utterly naïve and completely mistaken” idea that knowledge is “what we do learn through the entry of experience into our sense openings” (Popper, 1979, p. 61): for instance behaviourist psychology, but also more cognitive psychological theories imbued with ideas derived from associationist psychology (see chapter 4), made and still make this mistake (Popper, 1979). Though Popper has been often very critical of psychology as a science (e.g. see Popper, 1972, p. 47; Popper, 1979, p. 96 and p.156), he himself could not withstand the lure of identifying psychology with ‘subjectivity’ too. After all, Popper argued that all the subjective experiences, feelings, beliefs and convictions which should have no place in science must be banished to psychology. Apparently Popper thought of psychology either as a very weak and disorderly science, or as a non-scientific place of exile for all the subjectivity he wanted to get rid of. …a subjective experience, or a feeling of conviction, can never justify a scientific statement, and […] within science it can play no part except that of an object of an empirical (a psychological) inquiry. (Popper, 1972, p. 46) So why is it so difficult, or even impossible, to think about psychological, human cognition in non subjective terms? I belief the only reason for adopting the ‘subjective’ outlook to knowledge is that it is less bleak and unsettling than Darwinist epistemology: I don’t know about you, but I am not initially attracted by the idea of my brain as a sort of dungheap in which larvae of other people’s ideas renew themselves, before sending out copies of themselves in an informational diaspora. It does seem to rob my mind of its importance as both author and critic. […] We would like to think of ourselves as godlike creators of ideas, manipulating and controlling them as our whim dictates, and judging them from an independent, Olympian standpoint. (Dennett, 1996, p. 346) Yet, the adoption of Bayesian inference as a standard for human rationality out of a need for a comforting, subjective epistemology places psychology in an awkward predicament: for it leads to endless misconceptions about its frequentist statistical methodology and to a strange discrepancy with the Bayesian ideas on a theoretical level. Although Bayesian statistics may be very practical in artificial intelligence research, its On the common origins of psychology and statistics 117 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? subjective epistemology is philosophically a total faux pas. Philosophically Bayesian inference is old-fashioned; a reanimation of a eighteenth century idea. However, can psychologists bear to see psychology as it is? Time will tell. On the common origins of psychology and statistics 118 CHAPTER 3: PSYCHOLOGY AND STATISTICS: WITH OR WITHOUT A KNOWING SUBJECT? [3.G] Recapitulation & outlook on Part Two Recapitulation of chapter 3: Chapter 3 showed the philosophical incommensurability between frequentist and Bayesian statistics, by linking Bayesian statistics to ‘classical epistemology’ and frequentist statistics to ‘evolutionary epistemology’. From the perspective of evolutionary epistemology – that was formulated in the first place by the philosopher Karl Popper – rationality has no need for a ‘knowing subject’, ‘beliefs’ or ‘representations’. Popper has fought his whole life against ‘subjectivity’ and has tried to introduce ‘non-subjective’ words as ‘corroboration’ and ‘falsification’ to describe the growth of knowledge. It is clarified that frequentist statistical inference – such as it is widely applied in psychology – is successful because it ‘re-enacts’ the evolutionary algorithm of natural selection in an ‘accelerated’ way: the fact that chance is ‘tamed’ – because it is distinguished from ‘structural’ variation – leads to an accelerated pace of hypothesis elimination. Because this evolutionary epistemology is rather ‘bleak’ the frequentist ‘meaning’ of statistics is often poorly grasped by its applicants: both the semantics of cognitive psychological theories as well as of frequentist statistical inferential methodology are still drenched in ‘subjectivity’. The success of Bayesian statistics in connectionist artificial intelligence, has probably contributed to the fact that it has become in the so-called heuristics-and-biases program – initialized by Tversky and Kahneman in the 1970s – in cognitive psychology the ‘official’ standard of rationality against which human cognition is measured. Proponents of the heuristics-and-biases program have however concluded that human cognition does not function according to the rules of Bayesian inference and therefore is irrational or at least ‘bounded’ in its rationality. It is assumed that the adoption of Bayesian inference as a standard of rationality was guided by the fact that its classical epistemology is comfortingly subjective. Outlook on Part Two of this thesis: The central question of the second part of this thesis is how statistics and probability still concern my thought and my words, although they have nothing to do with my personal, Cartesian subjective beliefs. How can I say how probability and statistics concern me – without relapsing in Cartesian, subjective talk? The hypothesis, which will be proposed in order to answer this question, is that there was a seventeenth century conceptual turn which entailed the ‘textualization’ of the world – i.e., ‘nature’ became the ‘book of nature’ – and that changed the notions of rationality and probability in such a way that it entailed the latent beginnings of both psychology and statistics. This latent period lasted until the nineteenth century, wherein the notions rationality and probability again changed in a radical way. This change lead to the nineteenth century emergence of psychology and its statistical inferential methodology as we know them now. To substantiate this hypothesis I will focus on: (1) how the ‘predecessors’ of psychology – viz., the seventeenth century Cartesian rational subject and eighteenth century associationist psychology – were entangled with the ‘predecessors’ of modern statistical inferential methodology – viz., seventeenth and eighteenth century probability theory; (2) how the nineteenth century change of the notions ‘probability’ and ‘rationality’ which made it possible for psychology and frequentist statistical methodology – both useful and fruitful disciplines – to emerge, entailed at the same time rather unpleasant epistemological consequences that lead to hermeneutical, i.e. anti-statistical and antipsychological, reactions in nineteenth century thought. Finally, I will show how the ‘textual’ origin of probability and rationality may put the question how statistics concern me in a completely different light. On the common origins of psychology and statistics 119 INTERMEZZO: CONCLUSIONS OF PART ONE INTERMEZZO: CONCLUSIONS OF PART ONE Of course, the Cartesian substance dualism and the idea of man as the seat of rationality are outdated in contemporary psychology. Every psychologist knows how ‘bounded’ human rationality is, depending on ‘prosaic tricks’ such as heuristics, biases (cf. Kahneman et al., 1982) and somatic markers (Damasio, 1998). Yet, psychologists apparently still like to think of themselves as conscious knowing subjects endowed with ‘beliefs’ and ‘representations’ (Wheeler, 2005). However, the classical subjective epistemology which psychologists endorse is not congruent with the frequentist statistical methods they apply, for this frequentist methodology entails an evolutionary epistemology. Hence why there are so many conceptual misconceptions about statistics: it is the tendency of psychologists to interpret probability (whose assumptions underlie the statistical inferential methodology) subjectively. The Darwinian idea of the scientific enterprise as a process of mindless, blind error elimination on the level of collective, replicative phenomena does not agree well with how the average psychologist would like to think of himself, viz. as a individualistically thinking scientist with subjective considerations and beliefs. The idea of objective knowledge as a result of ‘blind algorithms’ is of course less ‘comfortable’ than the idea of a growing body of subjective knowledge. This explains why it is so difficult to grasp the ‘meaning’ of statistics and why statistical methodology is presented in psychology as a monolithic, timeless truth: it is a way to avoid the confrontation with the philosophical ideas underlying the frequentist methodology. If we look at the ideas of two nineteenth century thinkers who have endeavoured to formulate the philosophical impact of frequentist statistics – G. Th. Fechner (1801-1887) and C.S. Peirce (1839-1914) – one can notice that they both try to overcome their tendency to speak in subjective semantics ( see for Fechner: Heidelberger, 1987; Mayo, 1996; see for Peirce e.g. Reynolds, 2002). As I showed in chapter 3 Popper – who was inspired by Peirce (Popper, 1979) – was the first one who succeeded reasonably well in this task of overcoming subjective semantics. So, we may conclude with Popper that statistical methodology, rationality and probability have nothing to do with my personal, Cartesian subjective beliefs. However, people who think about statistics usually have quite strong feelings about it. It is apparently On the common origins of psychology and statistics 120 INTERMEZZO: CONCLUSIONS OF PART ONE difficult to think about statistical-psychological research practice without getting trapped in a discourse of either idolatry of the statistical method or a romantic longing to a hermeneutical, non-statistical psychology. The reason for these strong feelings is evident: although statistics, rationality and probability have apparently nothing to do with my personal subjective beliefs, they affect the question who I am ( apparently not a rational knowing subject) and my thought and my words. How can I say how statistics, probability and rationality concern me – without relapsing in Cartesian, subjective talk? This will be the subject of the second part of this thesis. On the common origins of psychology and statistics 121 PART TWO: TWO: The intertwinement of the notions underlying Statistics and Psychology: Probability and Rationality On the common origins of psychology and statistics 122 PART TWO:The intertwinement of the notions underlying Statistics and Psychology: Probability and Rationality PART TWO CONSISTS OF THE FOLLOWING CHAPTERS: CHAPTER 4: THE LATENT PERIOD [4.A] It all begins with Descartes – the rational, representational consciousness. [4.B] The book of Nature, epistemological uncertainty and the equivocation of the Cartesian ‘subject’. [4.C] Associationist psychology or the unproblematic ambivalence in the concept of probability [4.D] Rationality – between the Principle of Sufficient Reason (Leibniz) and the Principle of Non-sufficient Reason (J. Bernoulli) CHAPTER 5: THE CLASH WITH ‘NATURE’: THE LOCUS OF RATIONALITY RECONSIDERED. [5.A] The transition from classical to modern probability – the same probability, but in a different way. [5.B] The nineteenth century confrontation with 'nature' – an objective interpretation of chance. [5.C] Aversion of statistics and love of absolute chance – Nietzsche's absolute subjectivism [5.D] Just a name? The realism of C.S. Peirce and the nominalism of Pearson On the common origins of psychology and statistics 123 PART TWO:The intertwinement of the notions underlying Statistics and Psychology: Probability and Rationality CHAPTER 6: WHERE DO I STAND? THE SIGNIFICANCE OF STATISTICS [6.A] Fechner and Peirce: the Kollektiv as an end in itself. [6.B] The semantics of statistics. [6.C] From “metaphysics” to “prophysics” On the common origins of psychology and statistics 124 REFERENCES REFERENCES Aczel, A. D. (2004). Chance. A guide to gambling, love, the stock market & just about everything else. New York: Thunder's Mouth Press. Amossé, T., Andrieux, Y.-V., & Muller, L. (2001). L'esprit humain est-il bayésien? Courrier des statistiques(100), 25-28. Aristotle. (1995). Politics : books I and II (T. J. Saunders, Trans.). Oxford: Clarendon. Barnard, G. A. (1987). R.A. Fisher: A True Bayesian? International Statistical Review, 55(2), 183-189. Bartlett, M. S. (1966). Review of 'Logic of Statistical Inference', by Ian Hacking. Biometrika, 53(3-4), 631-633. Bem, S. (2005). Bent u daar nog? Over subjectiviteit en psychologie. (Afscheidscollege 3/12/2004, Universiteit Leiden, Faculteit sociale wetenschappen, Cognitieve Psychologie) [Are you still there? About subjectivity and psychology. Farewell lecture 3/12/2004, Leiden University, Faculty of Social Sciences, Cognitive Psychology ]. s.l.: s.n. Bem, S., & Jong, H. L. d. (1998). Theoretical Issues in Psychology. An introduction. London: Sage Publications. Bockstaele, P., Cerulus, F., & Vanpaemel, G. (Eds.). (2004). Ars Conjectandi. Over gokkers, geleerden en grote getallen. [On gamblers, scholars and large numbers. Catologue to the exhibition in the library of the Catholic University of Leuven, 26 May - 27 June 2004]. s.l.: s.n. Brunswik, E. (1943). Organismic achievement and environemental probability. Psychological Review, 50, 255-272. Carnap, R. (1951). Logic foundations of probability. London: Routledge and Kegan Paul. Cioffari, V. (1973). Fortune, fate, and chance. In P. P. Wiener (Ed.), Dictionary of the history of ideas. Studies of selected pivotal ideas. (Vol. 2, pp. 226-236). New York: Scribner. Cowles, M. (1989). Statistics in psychology: an historical perspective. Hillsdale, N.J.: Lawrence Erlbaum. Cronbach, L. J. (1957). The two disciplines of scientific psychology. American psychologist, 12, 671-684. Damasio, A. R. (1998). De vergissing van Descartes. Gevoel, verstand en het menselijk brein. [Descartes' error - Emotion, Reason and the Human Brain] (L. Teixeira de Mattos, Trans.). Amsterdam: Wereldbibliotheek. Danziger, K. (1985). The Methodological Imperative in Psychology. Philosophy of the Social Sciences, 15(1), 1-13. Danziger, K. (1987). Statistical Method and the Historical Development of Research Practice in American Psychology. In L. Krüger & G. Gigerenzer & M. S. Morgan (Eds.), The probabilistic revolution. Volume 2: Ideas in the sciences (Vol. 2, pp. 35-37). Cambridge, Massachusetts: MIT Press. Danziger, K. (1990). Constructing the subject. Historical origins of psychological research. Cambridge: Cambridge University Press. Darwin, C. (1998/1859). On the Origin of Species by Means of Natural Selection, or the preservation of Favoured Races in the Struggle for Life (facsimile of the first ed.). Cambridge, Mass.: Harvard University Press. Daston, L. (1987). Rational individuals versus laws of society. In L. Krüger & L. J. Daston & M. Heidelberger (Eds.), The probabilistic revolution. Volume 1: Ideas in history (Vol. 1, pp. 295-304). Cambridge, Massachusetts: MIT Press. Daston, L. (1988). Classical probability in the Enlightenment (1st ed.). Princeton: Princeton University Press. On the common origins of psychology and statistics 125 REFERENCES David, F. N. (1955). Studies in the history of probability and statistics (I). Dicing and gaming. (A note on the history of probability). Biometrika, 42(1/2), 1-15. David, F. N. (1962). Games, gods and gambling. The origins and history of probability and statistical ideas from the earliest times to the Newtonian era. London: Charles Griffin & Co. Dawkins, R. (1989). The selfish gene. Oxford: Oxford University Press. Dawkins, R. (2000, 10/3/2000). Obituary of Richard Dawkins on W.D. Hamilton (19362000). The Independent. Dehue, T. (1990). De regels van het vak. Nederlandse psychologen en hun methodologie, 1900-1985. Amsterdam: Van Gennep. Dennett, D. C. (1991). Mother Nature versus the Walking Encyclopedia: A Western Drama. In W. Ramsey & S. P. Stich & D. E. Rumelhart (Eds.), Philosophy and connectionist theory (pp. 21-30). Hillsdale, N.J.: Erlbaum Associates, 1991. Dennett, D. C. (1996). Darwin's Dangerous Idea. Evolution and the meanings of life. London: Penguin. Dennett, D. C. (1998). Brainchildren : essays on designing minds. Cambridge, MA: MIT Press. Desrosières, A. (1993). La politique des grands nombres. Histoire de la raison statistique (first edition ed.). Paris: Éditions la découverte. Desrosières, A. (1999). Statistique. In D. Lecourt & T. Bourgeois (Eds.), Dictionnaire d'histoire et philosophie des sciences (pp. 874-880). Paris: Presses Universitaires de France. Desrosières, A. (2001). How real are statistics? Four possible attitudes. Social Research, 68(2), 339-355. Desrosières, A. (2002). Adolphe Quetelet. Courrier des statistiques(104), 3-6. Fisher Box, J. (1978). R.A. Fisher. The life of a scientist. New York: John Wiley & Sons. Fisher, R. A. (1921). Studies in Crop Variation I. An Examination of the Yield of Dressed Grain from Broadbalk. Journal of Agricultural Sciences, 11, 109-135. Fisher, R. A. (1924). Studies in Crop Variation III. The Influence of Rainfall on the Yield of Wheat at Rothamsted. Philosophical Transactions of the Royal Society of London, Ser. B, 213, 89-142. Fisher, R. A. (1951). The Design of Experiments. Edinburgh: Oliver and Boyd. Fisher, R. A. (1953). Croonian lecture: Population Genetics. Proceedings of the Royal Society of London. Series B, Biological sciences, 141, 510-523. Fisher, R. A. (1955). Statistical methods and scientific induction. Journal of the Royal Statistical Society. Series B (Methodological), 17, 69-78. Fisher, R. A. (1970). Statistical methods for research workers (14 ed.). Edinburgh: Oliver and Boyd. Foucault, M. (1966). Les mots et les choses. Une archéologie des sciences humaines. Paris: Gallimard. Galileo, G. (1957). Discoveries and Opinions of Galileo (S. Drake, Trans.). New York: Garden City. Galton, F. (1869). Hereditary genius: an inquiry into its laws and consequences. London: s.n. Galton, F. (1886). Regression towards mediocrity in hereditary stature. Journal of the Anthroplogical Institute, 15, 246-263. Galton, F. (1888). Co-relations and their measurement, chiefly from anthropometric data. Proceedings of the Royal Society, 45, 135-145. Galton, F. (1889). Natural Inheritance. London: Macmillan. Galton, F. (1901). Biometry. Biometrika, 1(1), 7-10. On the common origins of psychology and statistics 126 REFERENCES Garber, D., & Zabell, S. (1979). On the emergence of probability. Archive for History of Exact Sciences, 21, 33-53. Garnham, A., & Oakhill, J. (2001). Thinking and reasoning. Oxford: Blackwell Publishers. Garson, J. (2002). Connectionism. In E. N. Zalta (Ed.), The Stanford Encyclopedia of Philosophy. Retrieved May 23, 2006, from: http://plato.stanford.edu/archives/win2002/entries/connectionism/. Gee, H. (1999). In search of deep time : beyond the fossil record to a new history of life. New York: Free Press. Gigerenzer, G. (1987a). Probabilistic thinking and the fight against subjectivity. In L. Krüger & G. Gigerenzer & M. S. Morgan (Eds.), The probabilistic revolution. Volume 2: Ideas in the sciences (Vol. 2, pp. 11-33). Cambridge, Massachusetts: MIT Press. Gigerenzer, G. (1987b). Survival of the fittest probabilist: Brunswik, Thurstone, and the two disciplines of psychology. In L. Krüger & G. Gigerenzer & M. S. Morgan (Eds.), The probabilistic revolution. Volume 2: Ideas in the sciences (Vol. 2, pp. 49-72). Cambridge, Massachusetts: MIT Press. Gigerenzer, G. (2000). Adaptive Thinking. Rationality in the real world. Oxford: Oxford University Press. Gigerenzer, G., Krauss, S., & Vitouch, O. (2004). The null ritual: what you always wanted to know about significance testing but were afraid to ask. In D. Kaplan (Ed.), The Sage handbook of quantitative methodology for the social sciences (pp. 391-408). Thousand Oaks, California: Sage publications. Gigerenzer, G., & Murray, D. J. (1987). Cognition as intuitive statistics. Hillsdale, NJ: Lawrence Erlbaum Associates. Gigerenzer, G., Swijtink, Z., Porter, T., Daston, L., Beatty, J., & Krüger, L. (1990). The empire of chance. How probability changed science and everyday life. Cambridge: Cambridge University Press. Gillies, D. (1971). A falsifying rule for probability statements. The British Journal for the Philosophy of Science, 22, 231-261. Gillies, D. (2003). Philosophical theories of probability. London: Routledge. Gribbin, J. (1985). Op zoek naar Schrödingers kat. Quantumfysica en de werkelijkheid. [In search of Schrödinger's cat]. Amsterdam: Contact. Hacking, I. (1975). The emergence of probability. A philosophical study of early ideas about probability, induction and statistical inference (1st ed.). London: Cambridge University Press. Hacking, I. (1976). Logic of statistical inference (1st pbk ed.). Cambridge: Cambridge University Press. Hacking, I. (2001). An introduction to probability and inductive logic. Cambridge: Cambridge University Press. Hacking, I. (2004). The taming of chance (8th ed.). Cambridge: Cambridge University Press. Hald, A. (1998). A history of mathematical statistics from 1750 to 1930 (1st ed.). New York: John Wiley & Sons. Haller, H., & Krauss, S. (2002). Misinterpretations of significance: A problem students share with their teachers? Methods of Psychological Research Online [Online Serial, retrievable from http://www.mpr-online.de ], 7(1), 1-20. Heidegger, M. (1988). Ontologie: Hermeneutik der Faktizität. (Freiburger Vorlesung Sommersemester 1923) (Vol. 63). Frankfurt am Main: Klostermann. Heidelberger, M. (1987). Fechner's indeterminism: from freedom to laws of chance. In L. Krüger & L. Daston & M. Heidelberger (Eds.), The probabilistic revolution. Volume 1: Ideas in history (Vol. 1, pp. 117-156). Cambridge, Massachusetts: MIT Press. Heiser, W. J. (1990). Datatheorie [Datatheory]. Leiden. On the common origins of psychology and statistics 127 REFERENCES Heiser, W. J. (2003). Trust in Relations. Measurement: Interdisciplinary Research and Perspectives, 1(4), 264-269. Heiser, W. J., & Verduin, K. (2005). Spreiding zonder fouten. Hoe de standaarddeviatie tot stand kwam als maat voor verscheidenheid. [Dispersion without errors. How the standard deviation emerged as a measure of diversity]. STAtOR, 6(3), 14-20. Hogben, L. (1957). Statistical theory : the relationship of probability, credibility and error. An examination of the contemporary crisis in statistical theory from a behaviourist viewpoint. London: Allen and Unwin. Hubbard, R. (2004). Alphabet soup: blurring the distinctions between p's and a's in research in psychology. Theory & Psychology, 14(3), 295-327. Hume, D. (2002/1739). A treatise of human nature. Oxford: Oxford University Press. Kahneman, D., Slovic, P., & Tversky, A. (Eds.). (1982). Judgement under Uncertainty: Heuristics and Biases. Cambridge: Cambridge University Press. Kahneman, D., & Tversky, A. (1972). Subjective probability: A judgment of representativeness. Cognitive Psychology, 3, 430-454. Kendall, M. G. (1942). On the future of statistics. Journal of the Royal Statistical Society, 105, 69-80. Kendall, M. G. (1956). Studies in the history of probability and statistics (II).The beginnings of a probability calculus. Biometrika, 43(1/2), 1-14. Kendall, M. G. (1963). Ronald Aymler Fisher, 1890-1962. Biometrika, 50(1-2), 1-15. Kendall, M. G. (1973). Chance. In P. P. Wiener (Ed.), Dictionary of the history of ideas. Studies of selected pivotal ideas. (Vol. 1, pp. 336-340). New York: Scribner. Lucas, A. M. (1995). Anglo-Irish Poems of the middle Ages. Dublin: Columba Press. MacKenzie, D. A. (1981). Statistics in Britain, 1865-1930. Edinburgh: Edinburgh University Press. Maistrov, L. E. (1974). Probability theory. A historical sketch. New York: Academic Press. Mayo, D. G. (1996). Error and the growth of experimental knowledge. Chicago: The University of Chicago Press. McClelland, J. L. (1994). Comment. Neural Networks and Cognitive Science: Motivations and Applications. Statistical science, 9(1), 42-45. Menand, L. (2002). The Metaphysical Club. A story of ideas in America. New York: Farrar, Straus and Giroux. Michell, J. (1999). Measurement in psychology. A critical history of a methodological concept (1st ed.). Cambridge: Cambridge University Press. Mises, R. von. (1936). Wahrscheinlichkeit, Statistik und Wahrheit. Einführung in die neue Wahrscheinlichkeitslehre und ihre Anwendung. (2nd ed.). Wien: Verlag von Julius Springer. Monod, J. (1970). Le hasard et la nécessité. Essai sur la philosophie naturelle de la biologie moderne. Paris: Seuil. Moore, D. S., & McCabe, G. P. (1999). Introduction to the practice of statistics. New York: W.H. Freeman & co. Neyman, J., & Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London, Ser. A, 231, 289–337. Nikolow, S. (2001). A. F. W. Crome's Measurements of the "Strength of the State": Statistical Representations in Central Europe around 1800. History of Political Economy, 33(Annual Supplement), 23-56. Oakes, M. (1986). Statistical inference: a commentary for the social and behavioral sciences. Chichester, UK: Wiley. On the common origins of psychology and statistics 128 REFERENCES Oosterhuis, T. (1991). De pijl van Zeno: een verhaal over de geschiedenis van de statistiek. Baarn: Fontein. Pearson, K. (1911). The grammar of science. Part I. Physical (3th, revised and enlarged ed. Vol. 1). London: Adam and Charles Black. Pearson, K. (1914). The life, letters and labours of Francis Galton. Volume I. Birth 1822 to Marriage 1853. (Vol. 1). Cambridge: Cambridge University Press. Pearson, K. (1924). The life, letters and labours of Francis Galton. Volume II. Researches of middle life (Vol. 2). Cambridge: Cambridge University Press. Pearson, K. (1930). The life, letters and labours of Francis Galton. Volume III. A: Correlation, personal identification and eugenics. B: Characterisation, especially by letters (Vol. 3). Cambridge: Cambridge University Press. Pearson, K. (1978). The history of statistics in the 17th and 18th centuries against the changing background of intellectual, scientific and religious thought. Lectures by Karl Pearson given at University College London during the academic sessions 1921-1933. Edited by E.S. Pearson. London: Charles Griffin & Company. Pearson, K., Weldon, W. R. F., & Davenport, C. B. (1901). Editorial: The spirit of Biometrika. Biometrika, 1(1), 3-6. Popkin, R. H. (1964). The History of Scepticism from Erasmus to Descartes (2nd ed.). Assen: Van Garcum. Popper, K. (1959). The Propensity Interpretation of Probability. British Journal for the Philosophy of Science, 10, 25-42. Popper, K. (1972). The logic of scientific discovery (1 ed.). London: Hutchinson. Popper, K. (1974). Conjectures and refutations. The growth of scientific knowledge. (5 ed.). London: Routledge and Kegan Paul. Popper, K. (1979). Objective Knowledge. An evolutionary approach (2nd revised ed.). Oxford: Clarendon Press. Popper, K. (1983). Realism and the aim of science. From the postscript to "The logic of scientific discovery". (1st ed.). London: Hutchinson & Co. Popper, K. (1990). A world of propensities. Bristol: Thoemmes. Porter, T. M. (1986). The rise of statistical thinking, 1820-1900. Princeton, NJ: Princeton University Press. Porter, T. M. (2003a). Measurement, Objectivity and Trust. Measurement: Interdisciplinary Research and Perspectives, 1(4), 241-255. Porter, T. M. (2003b). Objectivity and Trust: A Measured Rejoinder. Measurement: Interdisciplinary Research and Perspectives, 1(4), 286-298. Porter, T. M. (2003c). Statistics and statistical methods. In T. M. Porter & D. Ross (Eds.), The Cambridge history of science. The modern social sciences (Vol. 7, pp. 238-250). Cambridge: Cambridge University Press. Prigogine, I., & Stengers, I. (1985). Orde uit chaos (Order out of chaos). Amsterdam: Bert Bakker. Quetelet, A. (1835). Sur l'homme et le développement de ses facultés, ou Essai de physique sociale. Paris: Bachelier. Quetelet, A. (1846). Lettres à S.A.R. le duc regnant de Saxe Coburg et Gotha sur la théorie des probabilités, appliquées aux sciences morales et politiques. Bruxelles: M. Hayez. Reynolds, A. (2002). Peirce's scientific metaphysics. The philosophy of chance, law, and evolution. Nashville: Vanderbilt University Press. Robert, C. P. (2001). L'analyse statique bayésienne. Courrier des statistiques(100), 3-4. Rogers, T. B. (2002). Book review: Joel Michell. Measurement in psychology: A critical history of a methodological concept. Journal of History of the Behavioral Sciences, 38(1), 61-62. On the common origins of psychology and statistics 129 REFERENCES Romeijn, J. W. (2005). Bayesian inductive logic: inductive predictions from statistical hypotheses. Rijksuniversiteit Groningen. Room, A. (1986). Dictionary of changes in meaning. London: Routledge & Kegan Paul. Rosser Matthews, J. (2000). Statistics. In A. Hessenbruch (Ed.), Reader's guide to the history of science (pp. 706-707). London: Fitzroy Dearborn. Salsburg, D. (2001). The lady tasting tea. How statistics revolutionized science in the twentieth century. New York: W.H. Friedman and Company. Sambursky, S. (1956). On the possible and the probable in ancient Greece. Osiris, 12, 35-48. Schuh, F. (1964). Hoe bepaal ik mijn kans? Kansrekening met toepassing op spel en statistiek. Amsterdam: Agon Elsevier. Schweber, S. S. (1977). The Origin of the Origin Revisited. Journal of the History of Biology, 10(2), 229-316. Schweber, S. S. (1982). Demons, Angels, and Probability: some aspects of British Science in the Nineteenth Century. In A. Shimony & H. Feshbach (Eds.), Physics as natural philosophy: essays in honor Laszlo Tisza on his seventy-fifth birthday (pp. 319-363). Cambridge, Mass.: MIT Press. Schweber, S. S. (1983). Aspects of probabilistic thought in Great Britain during the 19th century: Darwin and Maxwell. In M. Heidelberger & L. Krüger & R. Rheinwald (Eds.), Probability since 1800. Interdisciplinary sudies od scientific development. Workshop at the centre for interdisciplinary research of the University of Bielefeld, September 16-20, 1982 (pp. 41-96). Bielefeld: B.K. Verlag. Sheynin, O. B. (1974). On the prehistory of probability. Archive for History of Exact Sciences, 12, 97-141. Stigler, S. M. (1986). The history of statistics. The measurement of uncertainty before 1900 (1st ed.). Cambridge, Massachusetts: The Belknap Press of Harvard University Press. Stigler, S. M. (1996). The history of statistics in 1933. Statistical Science, 11(3), 244-252. Stigler, S. M. (1997). Regression towards the mean, historically considered. Statistical Methods in Medical Research, 6, 103-114. Stigler, S. M. (1999). Statistics on the table. The history of statistical concepts and methods. Cambridge, Massachusetts: Harvard University Press. Swijtink, Z. G. (2000). Probability. In A. Hessenbruch (Ed.), Reader's guide to the history of science (pp. 596-597). London: Fitzroy Dearborn. Thompson, B. (2001). A Critical Review of a Critical History of Measurement (review of Joel Michell's Measurement in Psychology: A Critical History of a Methodological Concept). Theory & Psychology, 11(6), 855-856. Tversky, A., & Kahneman, D. (1980). Causal schemata in judgements under uncertainty. In M. Fishbein (Ed.), Progress in social psychology (Vol. 1). Hillsdale, NJ: Lawrence Erlbaum Associates. Vlis, J. H. v. d., & Heemstra, E. R. (1988). Geschiedenis van de kansrekening en statistiek (1st ed.). Utrecht: Pandata. von Plato, J. (1987). Probabilistic physics the classical way. In L. Krüger & G. Gigerenzer & M. S. Morgan (Eds.), The probabilistic revolution. Volume 2: Ideas in the sciences (Vol. 2, pp. 379-407). Cambridge, Massachusetts: MIT press. von Plato, J. (1995). Creating modern probability. Its mathematics, physics and philosophy in historical perspective. Cambridge: Cambridge University Press. Westergaard, H. (1969). Contributions to the history of statistics (1st ed.). The Hague: Mouton Publishers. Wheeler, M. (2005). Reconstructing the cognitive world: the next step. Cambridge, Mass.: MIT Press. On the common origins of psychology and statistics 130 REFERENCES Winkler, R. L. (1974). Statistical analysis: theory versus practice. In C.-A. S. Staël von Holstein (Ed.), The concept of probability in psychological experiments (pp. 127-140). Dordrecht, Holland: D. Reidel Publishing Company. Wozniak, R. H. (1999). Classics in psychology. 1855-1914: Historical essays. Bristol: Thoemmes Press ( and Tokyo: Maruzen) [co-published]. Yates, F. (1984). Book review of 'Neyman: from life', by C. Reid. Journal of the Royal Statistical Society. Series A (General), 147(1), 116-118. On the common origins of psychology and statistics 131

On the common origins of psychology & statistics

Related documents

Products

Support

On the common origins of psychology & statistics

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib