ISSUES AND TRENDS Jonathan Osborne and Maria Pilar Jiménez-Aleixandre, Section Coeditors Basic Inferences of Scientific Reasoning, Argumentation, and Discovery ANTON E. LAWSON Organismal, Integrative and Systems Biology, School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA Received 7 October 2008; revised 22 April 2009; accepted 29 April 2009 DOI 10.1002/sce.20357 Published online 28 May 2009 in Wiley InterScience (www.interscience.wiley.com). ABSTRACT: Helping students better understand how scientists reason and argue to draw scientific conclusions has long been viewed as a critical component of scientific literacy, thus remains a central goal of science instruction. However, differences of opinion persist regarding the nature of scientific reasoning, argumentation, and discovery. Accordingly, the primary goal of this paper is to employ the inferences of abduction, retroduction, deduction, and induction to introduce a pattern of scientific reasoning, argumentation, and discovery that is postulated to be universal, thus can serve as an instructional framework to improve student reasoning and argumentative skills. The paper first analyzes three varied and presumably representative case histories in terms of the four inferences (i.e., Galileo’s discovery of Jupiter’s moons, Rosemary and Peter Grants’ research on Darwin’s finches, and Marshall Nirenberg’s Nobel Prize–winning research on genetic coding). Each case history reveals a pattern of reasoning and argumentation used during explanation testing that can be summarized in an If/then/Therefore form. The paper then summarizes additional cases also exemplary of the form. Implications of the resulting theory are discussed in terms C 2009 Wiley Periodicals, of improving the quality of research and classroom instruction. Inc. Sci Ed 94:336 – 364, 2010 Correspondence to: Anton E. Lawson; e-mail: anton.lawson@asu.edu Contract grant sponsor: National Science Foundation. Contract grant number: EHR 0412537. C 2009 Wiley Periodicals, Inc. 337 INTRODUCTION Scientific literacy as an instructional goal typically includes students’ understanding of the nature of science and scientific reasoning (e.g., American Association for the Advancement of Science, 1989, 2007; Educational Policies Commission, 1961, 1966; National Research Council, 1990, 1996, 2001). Not surprisingly, numerous papers have been published in recent years exploring the nature of scientific reasoning, scientific argumentation, and scientific discovery. Yet differences and disagreements persist (e.g., Allchin, 2006; Alters, 1997; Bonner, 2005; Lawson, 2006a; Samarapungavan, Westby, & Bodner, 2006; Sampson & Clark, 2008; Westerland & Fairbanks, 2004; Wivagg & Allchin, 2002). Accordingly, the primary goal of this paper is to analyze several varied case histories to identify basic inferences and a pattern of scientific discovery that is postulated to be general enough to serve as an instructional framework to improve student reasoning and argumentative skills. Identifying basic inferences and such a pattern within varied contexts should help teachers and curriculum developers design and teach lessons that will help students construct better understanding of how science works, thus help them become scientifically literate. The examples may also help researchers improve the quality of their own research. We should note at the outset that the present view departs somewhat from the view of argumentation advanced by philosopher Stephen Toulmin (1969) and emphasized by science educators such as Newton, Driver, and Osborne (1999) and Erduran, Simon, and Osborne (2004) in that it sees the primary role of argumentation, not as one of convincing others of one’s point of view (although that is certainly part of the story) but rather as one of discovering which of several possible explanations for a particular puzzling observation should be accepted and which should be rejected. Thus, instead of Toulmin’s claims, warrants, and backings, at the heart of the present theory lies multiple possible explanations, predictions, and evidence designed to either support or contradict each proposed explanation. Indeed, the present view suggests that the best argument considers all of the alternatives and explicitly includes the relevant evidence and reasoning supporting and/or contradicting each. We begin by analyzing the reasoning presumably involved in Galileo Galilei’s (1564– 1642) discovery of Jupiter’s moons in 1610 (Galilei, 1610, as translated and reprinted in Shapley, Rapport, & Wright, 1954, and as initially interpreted by Lawson, 2002a). Galileo’s discovery is analyzed in terms of the inference of abduction, as defined in the present paper, and the inferences of retroduction, deduction, and induction as defined by Charles Sanders Peirce (1839–1914). Peirce was an American philosopher, logician, and mathematician. In the words of Misak (2004), “His work is staggering in its breadth. . . . But because of the scattered nature of his work and because he was always out of the academic mainstream, many of his contributions are just now coming to light” (p. 1). Peirce is generally credited as the originator of pragmatism, a philosophical view opposing logical positivism and favored by contemporaries William James and John Dewey. Pragmatism, with its roots in Darwinian evolutionary theory, argues against the existence of absolute or transcendental truth and in favor of a more ecological account of knowledge generation grounded in inquiry and in the testing and retention of ideas that work. Peirce’s position most relevant to the task at hand is his view of how theory, observation, and reasoning interact to test claims that have been advanced to explain puzzling observations. Following the analysis of Galileo’s discovery, we turn to a similar analysis of Rosemary and Peter Grants’ monumental research on Darwin’s finches, which is then followed by consideration of the Nobel Prize–winning research of Marshall Nirenberg. These case histories, in turn, are followed by additional examples that explore the extent to which the resulting pattern of reasoning and argumentation can be generalized to other scientific Science Education 1098237x, 2010, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sce.20357, Wiley Online Library on [16/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License BASIC INFERENCES OF SCIENTIFIC REASONING LAWSON and nonscientific fields. The cases have been selected because they represent human reasoning and discovery across a broad and presumably representative range of disciplines (i.e., astronomy, evolutionary biology, biochemistry, human history, geology, physics, and engineering). GALILEO’S DISCOVERY OF JUPITER’S MOONS Initial Puzzling Observation and Abduction In January 1610, Galileo had recently invented a new and improved telescope and had begun using it to explore the “heavens.” During his initial telescopic exploration, Galileo was puzzled by his observation of three tiny points of light near the planet Jupiter. Initially, he generated the hypothesis that they were fixed stars (i.e., stars that lie in the celestial sphere beyond Jupiter). Following Peirce, we refer to this spontaneous and creative act of hypothesis generation as abduction because the puzzling observation is seen as similar to, or analogous to, already explained observations that have been stored as part of declarative knowledge, thus get “abducted/stolen/transferred” from that store to tentatively explain the new observation—the points of light look like fixed stars so perhaps that is what they are. Other terms that have been used to label the process of abduction are analogical inference, analogical transfer, or analogical reasoning (e.g., Biela, 1993; Finke, Ward, & Smith, 1992; Gentner, 1989; Giere, Bickle, & Mauldin, 2006; Holyoak, 2005; Koestler, 1964; Lawson & Lawson, 1993; Sternberg & Davidson, 1995). It should be pointed out that in many cases, the abductive transfer requires more insight than shown in Galileo’s case because the “distance” between the analogous category and the target phenomenon is greater (e.g., the idea of orbiting planets as an analogue for the structure of atoms, Watson and Crick’s image of a spiral staircase as an analogue for the structure of DNA, Darwin’s use of artificial selection as an analogue for natural selection, and Kekule’s image of snakes eating their tails as an analogue for the benzene ring). Importantly, abduction can be viewed as an inferential process in the sense that it involves reasoning used to mentally derive causal claims (i.e., hypotheses/theories) from premises (cf., Polya, 1954; Tidman & Kahane, 2003). For example, if . . . planets orbit the Sun, and . . . atoms are like the solar system, then . . . perhaps electrons orbit an atomic nucleus. If . . . captive plants and animals have changed because of artificial selection, and . . . a process analogous to artificial selection occurred in nature, then . . . perhaps wild organisms have evolved due to a process of “natural” selection. If . . . homing pigeons navigate home using the Earth’s magnetic field, and . . . salmon can sense that magnetic field, then . . . perhaps they use it to return to their home streams. Using Retroduction for an Initial Test Once a hypothesis has been generated via abduction, it must pass its first inferential test, which Peirce called retroduction (note that Peirce did not conceptualize abduction and retroduction as different and distinct inferences; thus, he used the terms interchangeably) and described as follows: A puzzling observation C is made. However, if . . . A were true, then . . . C would be a matter of course. Therefore . . . there is reason to believe that A is true. (Turrisi, 1903/1997, CP 5.168) Science Education 1098237x, 2010, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sce.20357, Wiley Online Library on [16/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License 338 339 And by philosopher Norwood Hanson (1958): Before Peirce treated retroduction as an inference, logicians had recognized that the reasonable proposal of an explanatory hypothesis was subject to certain conditions. The hypothesis cannot be admitted, even as a tentative conjecture, unless it would account for the phenomena posing the difficulty—or at least some of them. (p. 86) And later by philosopher Carl Hempel (1966) like this: When a hypothesis is designed to explain certain observed phenomena, it will of course be so constructed that it implies their occurrence; hence, the fact to be explained will then constitute confirmatory evidence for it. (p. 37) Or more explicitly in Galileo’s case, like this: If . . . the points of light near Jupiter are fixed stars, and . . . I observe the heavens near Jupiter, then . . . I should see points of light that look like fixed stars. And . . . I do see points of light that look somewhat like fixed stars. Therefore . . . they may in fact be fixed stars. Thus, retroductive arguments follow an If/then/Therefore argumentative pattern. Although retroduction is a crucial aspect of evaluating alternative explanations, it is a weak test of a hypothesis in the sense that the least a hypothesis should do is to explain the puzzling observation that led to its generation in the first place.1 Interestingly, Galileo’s further retroductive reasoning led him to doubt his fixed-stars hypothesis. As he put it: . . . although I believed them to belong to the number of the fixed stars, yet they made me somewhat wonder, because they seemed to be arranged exactly in a straight line, parallel to the ecliptic, and to be brighter than the rest of the stars, equal to them in magnitude. (p. 59) When Galileo’s expressed doubt is cast in the If/then/Therefore form of the previous retroductive argument, we get the following: If . . . the three points of light near Jupiter are fixed stars, and . . . their sizes, brightness and positions are compared to each other and to nearby fixed stars, then . . . variations in size, brightness and position should be random, as is the case for other fixed stars. But . . . “they seem to be arranged exactly in a straight line, parallel to the ecliptic, and to be brighter than the rest of the stars.” Therefore . . . the fixed-stars hypothesis is contradicted. Or as Galileo put it, “yet they made me wonder somewhat.” Using Deduction to Generate Predictions Consequently, Galileo went back to his store of declarative knowledge to abductively generate another hypothesis. Perhaps, thought Galileo, the points of light are moons orbiting Jupiter—like the moon that orbits Earth, or like the planets that orbit the Sun. Presumably after using retroduction to convince himself that his orbiting-moons hypothesis would in 1 Although retroduction may be a weak test of a hypothesis, it can become a stronger test if the hypothesis can retroductively explain several aspects of the puzzling observation. For example, Galileo’s orbiting-moons hypothesis could retroductively explain (1) why the three points of light seemed to be arranged exactly in a straight line, (2) why they were parallel to the ecliptic, and (3) why they differed in brightness from the fixed stars. Science Education 1098237x, 2010, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sce.20357, Wiley Online Library on [16/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License BASIC INFERENCES OF SCIENTIFIC REASONING LAWSON fact explain the puzzling observation, he then sought a way to construct a more convincing test. That more convincing test required generating one or more predictions about future observations that should occur provided that his new hypothesis is correct. Peirce referred to this inference as deduction, which he described as follows: Abduction [retroduction] having suggested a theory, we employ deduction to deduce from that ideal theory a promiscuous variety of consequences to the effect that if we perform certain acts, we shall find ourselves confronted with certain experiences. We then proceed to try these experiments, and if the predictions of the theory are verified, we have a proportionate confidence that the experiments that remain to be tried will confirm the theory. (Bergman & Paavola, 1905/2003a, CP 8.209) Thus, using deduction,2 Galileo generated an argument that led from his new hypothesis to two future predictions, that is, If . . . the three points of light are moons orbiting Jupiter (orbiting-moons hypothesis), and . . . I observe them over the next several nights (planned test), then . . . some nights they should appear to the east of Jupiter and some nights they should appear to the west. Further, they should always appear along a straight line on either side of Jupiter (predictions). Making the Necessary Observations After presumably deriving these two predictions via deduction, Galileo remarked, “I therefore waited for the next night with the most intense longing, but I was disappointed of my hope, for the sky was covered with clouds in every direction” (p. 60). So because of cloud cover, Galileo was unable to make the necessary observations to compare with his deductively derived predictions. Fortunately, the next night and several subsequent nights were clear and sure enough, the points of light appeared just as Galileo’s orbiting-moons hypothesis led him to predict. In Galileo’s (1610) words, I, therefore, concluded, and decided unhesitatingly, that there are three stars in the heavens moving about Jupiter, as Venus and Mercury round the sun. . . . These observations also established that there are not only three, but four, erratic sidereal bodies performing their revolutions round Jupiter. . . . These are my observations upon the four Medicean planets, recently discovered for the first time by me. (pp. 60 – 61) Using Induction to Draw a Conclusion Consequently, we can complete the previous deductive argument with Galileo’s observed results and conclusion like this: And . . . some nights they appeared to the east of Jupiter and some nights they appeared to the west. Further, they always appeared along a straight line on either side of Jupiter (observed results). Therefore . . . the orbiting- moons hypothesis is supported (conclusion). 2 Like abduction, deduction depends on connections with declarative knowledge. Declarative knowledge is needed for the thinker to know what is implied by any particular hypothesis in question. Thus, one should not view deduction as taking place in a strictly “logical” or “necessary” fashion or as a process resulting in certainty. In this sense, both retroduction and deduction are dependent upon specifics related to the situation at hand (i.e., what one might call disciplinary or declarative knowledge). For further explication, see Lawson (2006b). Another key point here is that the thinker is generating predictions about outcomes that the thinker has yet to observe. But this does not mean that the observations may not have already been made by someone. Science Education 1098237x, 2010, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sce.20357, Wiley Online Library on [16/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License 340 341 According to Peirce, this final inference, which he called induction, was used to draw this conclusion (Turrisi, 1903/1997). More generally, If . . . the predicted and observed results match, like they do in this case, then . . . the hypothesis is supported. On the other hand, if . . . the predicted and observed results do not match, then . . . the hypothesis would have been contradicted. Although Peirce referred to this inference as induction, it is not the form of induction that some have claimed generates general conclusions from limited cases (e.g., this crow is black, so is this one, and so on—therefore all crows are black)—a form of “enumerative” induction that people probably do not use (e.g., Lawson, 2005; Popper, 1965). Rather, enumerative induction can at best suggest descriptive claims in need of deductive test (e.g., all of the crows I have seen are black; thus, perhaps, all crows are black. If . . . all crows are black, and . . . this new bird is a crow, then . . . I deduce/predict that it will also be black). The form of induction that Galileo presumably used can be characterized as an inference that leads to increased confidence in one’s conclusions with each additional supporting or contradicting result. In Peirce’s words, If that supposition be correct, a certain sensible result is to be expected under certain circumstances which can be created, or at any rate are to be met with. The question is, Will this be the result? If Nature replies “No!” the experimenter has gained an important piece of knowledge. If Nature says “Yes,” the experimenter’s ideas remain just as they were—only somewhat more deeply engrained. If Nature says “Yes” to the first twenty questions, although they were so devised as to render that answer as surprising as possible, the experimenter will be confident that he is on the right track, since 2 to the 20th power exceeds a million. (Turrisi, 1903/1997, CP 5.168) Note, however, that consistent with Peirce’s underlying pragmatism, this view of knowledge generation falls short of certainty. One cannot be certain that an explanation is correct because explanation generation is the product of human imagination and any number of alternatives may lead to the same prediction. Hence, the subsequent observation of a specific predicted result can, in theory at least, be taken as evidence for more than one explanation. Likewise, one cannot be certain that a contradicted explanation is in fact wrong because a mismatch between a prediction and an observation may not be due to a faulty explanation. It may be due to a faulty test and/or to a faulty deduction. Also as pointed out by authors such as Brannigan (1981) and Collins (1985), scientific claims are generated and evaluated within social and cultural contexts that play a role in their acceptance or rejection. Recall the words of Charles Darwin written in the concluding chapter of The Origin of Species (initially published in 1859): Although I am fully convinced of the truth of the views given in this volume under the form of an abstract, I, by no means expect to convince experienced naturalists whose minds are stocked with a multitude of facts all viewed during a long course of years, from a point of view directly opposite to mine. . . . but I look with confidence to the future, -to young and rising naturalists, who will be able to view both sides of the question with impartiality. (1898 edition, pp. 294 – 295) Similarly, in his autobiography, the physicist Max Planck (1949) wrote: “A scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it.” Science Education 1098237x, 2010, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sce.20357, Wiley Online Library on [16/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License BASIC INFERENCES OF SCIENTIFIC REASONING LAWSON Figure 1. A model of the elements of If/then/Therefore reasoning and argumentation used during the generation and subsequent test of proposed explanations. Arguments are retroductive when results have been obtained by the thinker before hypothesis and prediction generation and deductive when results are obtained after. Summary of Galileo’s Reasoning in Terms of Abduction, Retroduction, Deduction, and Induction To summarize, we can identify four basic inferences and a pattern of scientific reasoning and argumentation, as depicted in Figure 1 and in Table 1, as follows: 1. First, thanks to his new and improved telescope (the role of technology), Galileo undertook a new exploration that led to a puzzling observation (the three unexplained points of light near Jupiter). 2. Then, thanks to his prior store of declarative knowledge, Galileo used abduction to generate a hypothesis (a tentative explanation) for the points of light (i.e., perhaps they are fixed stars). 3. Next, Galileo used retroduction to subconsciously test his fixed-stars hypothesis, which led to some doubt and then to rejection. 4. Then he once again used abduction to generate another hypothesis (the orbitingmoons hypothesis), which when presumably checked by retroduction was supported. 5. He then used deduction to generate future predictions, which also require connections in declarative knowledge. 6. Subsequently, after the cloud cover dissipated he made the necessary observations, which matched his predictions. 7. Finally, on the basis of this match, he used induction to draw the conclusion that his orbiting-moons hypothesis had been supported. Therefore, he was able to proudly proclaim to the world that he was the first to discover “four, erratic sidereal bodies performing their revolutions round Jupiter.” Viewed in this way, scientific reasoning and discovery consist of undertaking novel explorations that lead to puzzling observations that are subsequently explained by the Science Education 1098237x, 2010, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sce.20357, Wiley Online Library on [16/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License 342 343 TABLE 1 Basic Inferences of Scientific Reasoning, Argumentation, and Discovery Inference Question Example Abduction What caused the puzzling observation (e.g., the three new points of light near Jupiter)? Retroduction Does the proposed cause explain what we already know? Deduction What does the proposed cause lead us to predict about future observations? Induction How do the predictions and new observations compare? If . . . points of light seen in the night sky are caused by fixed stars embedded in the celestial sphere, and . . . three new similar looking points of light are seen in the night sky, then . . . perhaps they also are fixed stars. If . . . the points of light are fixed stars, and . . . their positions are compared to each other, then . . . their positions should be random. But . . . they appear exactly in a straight line parallel to the ecliptic. Therefore . . . perhaps they are not fixed stars. If . . . the three points of light are moons orbiting Jupiter, and . . . I observe them over the next several nights, then . . . some nights they should appear to the east of Jupiter and some nights they should appear to the west. Further, they should appear along a straight line on either side of Jupiter. If . . . the new observations match the predictions based on the orbiting-moons hypothesis, as they do in this case (e.g., some nights the lights appeared to the east of Jupiter and some nights they appeared to the west), then . . . the hypothesis is supported. cyclic and repeated use of abduction, retroduction, deduction, observation, and induction. Again in Peirce’s words, Abduction [retroduction] furnishes all our ideas concerning real things, beyond what are given in perception, but is mere conjecture, without probative force. Deduction is certain but relates only to ideal objects. Induction gives us the only approach to certainty concerning the real that we can have. In forty years diligent study of arguments, I have never found one which did not consist of these elements. (Bergman & Paavola, 1905/2003a, CP 8.209) We next consider Rosemary and Peter Grants’ monumental research on Darwin’s finches of the Galapagos Islands to see if we can identify the same inferences and pattern of reasoning and argumentation in biological discovery. ROSEMARY AND PETER GRANTS’ RESEARCH ON DARWIN’S FINCHES According to Allchin (2006): “Scientists follow many methods: namely whatever works or seems appropriate to the task at hand. Hence, Rosemary and Peter Grants’ work on Science Education 1098237x, 2010, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sce.20357, Wiley Online Library on [16/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License BASIC INFERENCES OF SCIENTIFIC REASONING LAWSON Darwin’s finches—massive data collection done without any explicit hypothesis (as one notable case) has nonetheless led to significant and widely respected claims” (p. 118). Is Allchin’s characterization of the Grants’ work correct? If so, then their research certainly would not fit the previous pattern. However, in characterizing the Grants’ research, Allchin failed to cite any of their original accounts. Indeed, when one does consult what the Grants say they did and when they did it, a very different picture emerges (e.g., Grant, 1986; Grant & Grant, 1989; P. R. Grant, personal communication, April 4, 2006). Thus, let us take a close look at just what the Grants had to say about their research (also see Lawson, 2009a). First, consider Peter Grant’s comments in the Preface of his 1986 book Ecology and Evolution of Darwin’s Finches: I chose to study the finches for two quite different reasons. The first arose from a confusion about the significance of population variation. . . . The second reason sprang from a similar confusion concerning inter-specific competition. . . . Since, the classical case of character displacement was invalid (Grant 1972b, 1975a), it was logical to turn attention to the classical case of character release. The classical case involves two species of Darwin’s Finches on the Galapagos Islands (Brown & Wilson 1956). Despite having two good reasons for studying the finches, I might never have begun research on them without the stimulus of a proposal from a prospective postdoctoral Fellow, Ian Abbott. He had developed a plan for detecting the effects of inter-specific competition among Darwin’s Finches. . . . We prepared a research proposal and sought financial support. (pp. xi – xii) Also consider these two quotes from the Preface and opening chapter of Rosemary and Peter Grants’ follow-up book Evolutionary Dynamics of a Natural Population: The Large Cactus Finch of the Galapagos (Grant & Grant, 1989): Genetic variation in quantitative characteristics is the raw material for much of evolution. A substantial body of theoretical work deals with the maintenance and significance of such genetic variation. Field studies of the subject have been largely neglected, yet such studies that employ a theoretical framework can be immensely valuable. (p. xvii) The theoretical framework sets the scope of the study and helps us to identify major factors in need of measurement. (p. 11) These quotes should make it clear that the Grants’ data collection was directed and preceded by a theoretical framework—specifically evolutionary theory and the classical case of character release. Also consider this passage from Peter Grant (1986), a passage that provides the general sequence of their reasoning and research: Testing the competition hypothesis is difficult, for two reasons. First, the hypothesis deals with the past. Since we cannot reconstruct those events precisely, we cannot test the hypothesis directly . . . instead it must be tested through its consequences (predictions). . . . To put the arguments into a testable framework we must rephrase them, along the following lines. The observations to be explained are the distributions of species and the inter-island differences in beak size and shape; the hypothesis is that distribution and morphology were causally influenced by inter-specific competition for food; the main assumption upon which the hypothesis rests is that the feeding niche of a population is reflected in, and hence adequately indexed by, the average beak characteristics. . . . I shall now give two examples of an examination of the hypothesis through a test of its predictions. . . . We should expect that G. conirostris on Espanola, with mean beak characteristics intermediate between those of the absent G. magnirostris, G. fortis, and G. scandens, has an intermediate feeding niche position too. Not only that, it is expected to combine the niches of the three missing species, and consequently its niche should be particularly broad. These are falsifiable predictions Science Education 1098237x, 2010, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sce.20357, Wiley Online Library on [16/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License 344 345 because they are not necessarily true3 . . . data to test these predictions were collected in the early and middle dry seasons . . . in 1973 – 1979. . . . The predictions were supported by the results. (p. 301) Accordingly, we can summarize the Grants’ research in terms of the four inferences previously used to characterize Galileo’s research. A time sequence is also included to document the order of events and to make clear that, contrary to Allchin’s claim, the generation of explicit hypotheses and the deductive derivation of predictions preceded their “massive” data collection. Puzzling Observation and Causal Question A puzzling observation that confronted 20th century biologists was the distribution of finch species on the Galapagos Islands and their interisland differences in morphology. More specifically, what caused (in terms of evolutionary theory) the distributions and the interisland morphological differences, such as the present-day intermediate beak size, of G. conirostris? David Lack had raised this general causal question in the literature as early as 1947, and according to Peter Grant (1986), it “. . . had, by 1971, not lost any of its freshness” (p. xi). Abduction The evolutionary-based hypothesis put forth to account for the distributions and morphological differences was that they were caused by inter-specific competition for food. Peter Grant discussed this hypothesis with respect to G. conirostris in a paper published in 1972 (see Grant, 1986, p. xii). Thus, by 1972, the hypothesis must have been part of Peter Grant’s declarative knowledge, thus must have been previously generated, perhaps in response to his reading of the Brown and Wilson paper that discussed the classical case of character release involving two similar species of Darwin’s finches. Thus, the hypothesis was then abductively generated (e.g., If . . . the characteristics of the Darwin’s finches studied by Brown and Wilson were caused by character release, then . . . perhaps the interisland morphological differences of G. conirostris were similarly caused by character release). Retroduction This character-release (i.e., inter-specific competition) hypothesis could then be retroductively tested with an argument that would look something like this: 3 Peter Grant’s use of the term falsifiable should not be seen as adoption of the view (sometimes and probably mistakenly attributed to Karl Popper) that science progresses only via the falsification (disproof) of explanatory hypotheses. Upon reflection, it should be clear that a scientist with a novel explanation for some puzzling observation does not want to falsify his or her explanation (e.g., Woodward & Goodstein, 1996). However, as Grant states, the approach does oblige the scientist to derive and conduct tests that could in principle contradict the hypothesis in question. One should say contradict, but not falsify, because, as mentioned, the source of a mismatch between predicted results and observed results might not be due to a faulty hypothesis. Instead, it might be due to a faulty test and/or a faulty deduction. Nevertheless, scientists must be willing and able to test their proposed explanations by planning tests that deductively yield predicted results that may in fact not occur, thus potentially contradict their explanations. For example, in Galileo’s case, had he not observed, on subsequent nights, the points of light to the east and then to the west of Jupiter, as predicted, his observations would have contradicted (i.e., “falsified”) his orbiting-moons hypothesis. Science Education 1098237x, 2010, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sce.20357, Wiley Online Library on [16/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License BASIC INFERENCES OF SCIENTIFIC REASONING LAWSON If . . . inter-specific competition during the past with G. magnirostris, G. fortis, and G. scandens caused the present-day intermediate beak size of G. conirostris (hypothesis), and . . . one examines the beak size of G. conirostris on the island of Espanola where G. magnirostris, G. fortis, and G. scandens are now missing (imagined test), then . . . G. conirostris should have an intermediate beak size (prediction). And . . . G. conirostris does have an intermediate beak size (past observation). Therefore . . . retroductive support exists for the hypothesis. Deduction As mentioned, one should not simply retroductively test hypotheses. One should also deductively derive predictions that can then direct the collection of future relevant data. Accordingly, to further and convincingly test the inter-specific competition hypothesis, the following deductive argument was generated: If . . . inter-specific competition during the past with G. magnirostris, G. fortis, and G. scandens caused the present-day intermediate beak size of G. conirostris (hypothesis), and . . . one examines the beak characteristics and feeding niche of G. conirostris on the island of Espanola where G. magnirostris, G. fortis, and G. scandens are now missing (imagined test), then . . . in addition to its beak characteristics intermediate between those of its now missing competitors, G. conirostris should also have an intermediate feeding niche position and a particularly broad niche combining those of its three missing competitors (deduced predictions). Data Collection and Induction Armed with these predictions, the Grants then sought funds from the National Science Foundation. Funds were obtained and their first trip to the Galapagos Islands took place in 1973 (Grant, 1986, p. xii). The relevant data were then collected from 1973 through 1979. Subsequent data analysis indicated that, as predicted, G. conirostris does have an intermediate and particularly broad feeding niche. Because these observed results matched the predicted results, the Grants then presumably used induction to conclude that the inter-specific competition hypothesis had been supported, that is: If . . . the predicted and observed results match, like they do in this case, then . . . the hypothesis is supported. Therefore, following Peirce’s lead, we have identified the inferences of abduction, retroduction, deduction, and induction in the Grants’ research—research that can accordingly be summarized in terms of the same If/then/Therefore pattern of reasoning and argumentation that we found in Galileo’s discovery of Jupiter’s moons. We next turn to Marshall Nirenberg’s Nobel Prize–winning biochemical research conducted during the early 1960s to see whether it employed the same elements and followed the same argumentative pattern. THE NOBEL PRIZE–WINNING RESEARCH OF MARSHALL NIRENBERG Bonner (2005) argued for the existence of at least two scientific methods, which he referred to as Method A and Method B.4 Bonner described Method A as one in which puzzling observations provoke hypothesis generation, which then guide the selection and 4 One should not interpret Bonner’s use of the term method to imply a set of steps that ensures success. Because creativity is involved in doing science, both in terms of generating interesting explanations and in figuring out ways to test them, it is preferable to interpret the word “method” as a general plan of what needs to be done. In a sense, having a plan helps. Even though you may not know exactly what to do in each specific case, at least you consciously have a plan of what you should do. And if you do not have a plan, you cannot play the game, at least not very well (e.g., Platt, 1964). Science Education 1098237x, 2010, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sce.20357, Wiley Online Library on [16/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License 346 347 planning of future deductive tests, which are followed by the gathering and analysis of data. When using Bonner’s Method B, however, hypotheses are generated only after the collection of data. The hypotheses then serve to explain the already gathered data. In support of the existence and usefulness of Method B, Bonner cites the Nobel Prize– winning research of Marshall Nirenberg conducted during the early 1960s. According to Bonner, Nirenberg’s research followed Method B and asked this descriptive question: “What amino acid does UUU code for?” At the time, biologists thought that the DNA code consisted of four letters (adenine, A; guanine, G; cytosine, C; and thymine, T). They also suspected that the DNA code was first translated into an RNA code, also with four letters, but with uracil (U) substituting for thymine (T). Hence, an RNA code consisting of combinations of As, Gs, Cs, and Us somehow coded for the production of proteins by somehow stringing the 20 some amino acids together. So according to Bonner’s interpretation of Nirenberg’s research, there could have been any 1 of 20 answers to his descriptive question (e.g., UUU codes for serine, UUU codes for valine, UUU codes for phenylalanine). In Bonner’s view, Nirenberg harbored no hypotheses and advanced no predictions about which amino acid would be produced. Nirenberg simply wanted to know which of the 20 amino acids UUU codes. In other words, the fact that it turned out to be phenylalanine was just the way it turned out and was no more or less theoretically significant than UUU coding for valine, serine, or any other of the 20 some possibilities. Based on Bonner’s view of Nirenberg’s research, it is surprising to learn how others at the time responded when they learned of Nirenberg’s phenylalanine result. For example, consider this response by Frances Crick contained in a paper published in Nature (Crick, Barnett, Brenner, & Watts-Tobin, 1962): At the recent Biochemical Congress in Moscow, the audience of Symposium I was startled by the announcement of Nirenberg that he and Matthaei had produced polyphenylalanine (that is a polypeptide all the residues of which are phenylalanine by adding polyuridic acid, that is, an RNA the bases of which are all uracil) to a cell-free system which can synthesize proteins. (p. 1232) One has to wonder why the audience was “startled” to learn that a string of Us codes for phenylalanine and not for say valine or serine. Perhaps, there is more to the story than Bonner is acknowledging. Also consider Crick’s comment in a letter to Nirenberg dated January 4, 1962: “The English papers have made rather a fuss about our Nature paper, which was published on Saturday, but as far as I have stressed that it is your discovery which was the real break-through.” Crick’s breakthrough sentiment about Nirenberg’s research was echoed in two other letters to Nirenberg. One letter from the famous French researcher Francois Jacob dated December 20, 1961, had this to say: “Many thanks for your two manuscripts. It is a wonderful story. All my congratulations.” The other letter from H. J. Muller of Indiana University dated February 1, 1962, stated, Let me express the thanks and appreciation of the Committee that arranged the recent symposium on RNA coding for your kindness in having come here for the truly remarkable contribution that you have made. It was inspiring to the older and to the younger hearers alike to follow the course of the marvelous break-through that you described to us. (All letters are online at http://profiles.nlm.nih.gov/) Nirenberg’s colleagues were not the only ones startled and impressed by his “wonderful story”—his “marvelous breakthrough.” The newspapers were also lauding Nirenberg’s Science Education 1098237x, 2010, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sce.20357, Wiley Online Library on [16/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License BASIC INFERENCES OF SCIENTIFIC REASONING LAWSON achievement. Importantly, they placed it in the larger theoretical context of the day. Consider, for example, the following paragraphs written in an article, titled “NIH Researchers Crack the Genetic Code,” published in the Medical World News (January 5, 1962): The enigma of genetic coding, considered a fundamental secret of life, may be on the verge of solution. In just-published and about-to-be published papers, several research teams are reporting experimental proof of what has been largely theory: the intricate process by which structure and function of living organisms are shaped. One group has begun to crack the DNA-RNA code—the key to the whole mystery. Soon they expect to decipher the entire set of instructions by which genetic messengers direct the manufacture of proteins—the basic stuff of life. The major achievement in RNA research is the work of two young biochemists at the National Institute of Arthritis and Metabolic Diseases, Drs. Marshall W. Nirenberg and J. Heinrich Matthaei. Behind their work, however, is a whole series of investigations which has produced the basic theory and its preliminary experimental support. Fundamentally, the theory states that the hereditary “blueprints” of the cell structure and function are coded within the cell nucleus as long-chain molecules of deoxyribonucleic acid (DNA). These plans are transmitted, in a series of steps, to the cytoplasmic “assembly line” where they direct the synthesis of each cell’s characteristic products. (p. 18, online at http://profiles.nlm.nih.gov/) If we assume that this is a relatively accurate account, then we can see why Nirenberg’s result caused such a fuss. He not only answered Bonner’s narrow descriptive question but also provided a key piece of evidence to help answer a much broader causal question, namely: How does DNA code for the production of proteins? Importantly, by helping answer this more fundamental theoretical question, Nirenberg had begun to “crack” the genetic code—a breakthrough worthy of a Nobel Prize. Thus, Bonner’s characterization of Nirenberg’s research as descriptive and exemplary of Method B appears misleading. A more accurate interpretation is that Nirenberg was using Method A. Consequently, his research can be better understood as a theory-driven attempt to find out how the letters of DNA code for the production of proteins. To do so, Nirenberg generated a theory claiming that (a) specific combinations of at least three of the four letters of DNA first serve as a template for the production of RNA; (b) specific combinations of at least three of the four letters of RNA then serve as a template for sequencing specific amino acids; and (c) amino acids when strung together make proteins. Accordingly, Nirenberg’s reasoning and his key deductive and inductive argument can be summarized similarly to the previous cases of Galileo and the Grants, that is, If . . . the above theory is correct, and . . . we conduct an experiment with RNA made only of U’s (imagined test), then . . . a polypeptide molecule should be synthesized and it should consist of only one type amino acid (predicted result via deduction). And . . . when Nirenberg and Matthaei (1961) conducted the test, they found that a polypeptide chain consisting of only one type amino acid (i.e., phenylalanine) was produced (observed result). Therefore . . . support had been found for the theory5 (conclusion via induction). 5 When this If/then/Therefore characterization was read to Nirenberg during a telephone conversation, he replied: “That’s exactly right.” Also consistent with Nirenberg’s use of Method A and his goal of theory testing, he said that at the time he did not even know whether the message came from DNA or from RNA, or for that matter if mRNA even existed (M. W. Nirenberg, personal communication, December 2005). Science Education 1098237x, 2010, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sce.20357, Wiley Online Library on [16/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License 348 349 Of course, additional questions remained. Is the code a triplet code? Is it nonoverlapping? Is it degenerate? Nevertheless, presumably thanks to the use of Method A, the “marvelous breakthrough” had been made. The basic theory had been supported and the genetic code was beginning to crack. A Closer Look at Bonner’s Method B Interestingly, in terms of inferences, Bonner’s Method B appears to involve the use of retroduction in the sense that hypotheses are generated to explain the some puzzling aspect of previously gathered data. This interpretation implies that Method B is really only part of the process. As mentioned, following a successful use of retroduction, a more difficult and more convincing test needs to be conducted in which the hypothesis is used to deduce a prediction(s) about how some future test(s) should turn out. Another problem with Method B is its apparent weakness in terms of deciding what data to gather at the outset. Without some theory or hypothesis to guide one’s search, how is one supposed to know what data to collect or what experiment to conduct? Philosopher Carl Hempel (1966) put it this way: In sum, the maxim that data should be gathered without guidance by antecedent hypotheses about the connections among the facts under study is self-defeating, and is certainly not followed in scientific inquiry. On the contrary, tentative hypotheses are needed to give direction to scientific investigation. (p. 13) Similarly, philosophers Theodore Schick and Lewis Vaughn (1995) commented, A moment’s reflection reveals that data collection in the absence of a hypothesis has little or no scientific value. Suppose, for example, that one day you decide to become a scientist and having read a standard account of the scientific method you decide to collect some data. Where should you begin? Should you start by cataloging all the items in your room, measuring them, weighing them. . . ? Clearly there’s enough data in your room to keep you busy for the rest of your life. (p. 191) More recently, however, Mahootian and Eastman (in press) argue that the volume of observational data and the power of high-performance computing have increased by several orders of magnitude and have reshaped the practice of science much in the way of Bonner’s Method B. They advance what they call an observational-inductive (OI) approach to describe that new practice and to complement what they call the old hypothetico-deductive (HD) approach. For example, one could now measure say 100 different variables and use a high-powered computer to virtually instantaneously calculate correlation coefficients among all 100 variables. Then without any prior hypotheses, one could sift through the resulting coefficients to see which ones are relatively large (e.g., ≥0.80). Then upon finding any such large coefficients, one could generate hypotheses to tentatively explain them, then deduce predictions, and so on. Of course, in theory, one could do this. But we can hear Hempel, Schick, and Vaughn ask: Why would our imaginary scientist choose those 100 variables and not some other 100? Are there not prior conceptions (i.e., prior hypotheses/theories) involved in knowing which variables to select and which to omit? If true, then Mahootian and Eastman are not advancing a fundamentally different approach. Rather our imaginary scientist is still using prior hypotheses/theories, albeit perhaps on a subconscious plane, to select which variables to pay attention to and which ones to ignore. When the resulting coefficients are Science Education 1098237x, 2010, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sce.20357, Wiley Online Library on [16/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License BASIC INFERENCES OF SCIENTIFIC REASONING LAWSON then calculated and observed, they may fit those prior conceptions or they may not. If they do not, then the scientist has a new puzzling observation in need of explanation, which of course would need to be tested via retroduction, deduction, and induction. Alternatively, it is possible to imagine someone randomly picking 100 variables with no prior conceptions about those variables, then using a computer to calculate correlation coefficients, and so on. But this is unlikely to advance our collect scientific knowledge—at least not very quickly. This is not to say that the OI approach could not be used. It simply means that if used completely devoid of any guidance on which variables are selected, the approach is unlikely to be productive. After all, one cannot “stand on the shoulders of giants” if one cannot find the giants or if one lacks a ladder to climb up on. Perhaps, another point is in order at this time. We are arguing that science begins with puzzling observations. Certainly, the encounter with a puzzling observation is not consciously planned. Recall that Galileo was simply using his new telescope to take a “random walk” around the “heavens.” His initial observations were not designed to test a hypothesis or a theory. But this does not mean that he did not have prior conceptions about what he might see. After all, why was his notice of the three points of light near Jupiter puzzling in the first place? The answer is that his immediate assimilation of those points of light into his “fixed-stars” conception retroductively led to a contradiction, that is, If . . . the three points of light near Jupiter are fixed stars, and . . . their sizes, brightness and positions are compared to each other and to nearby fixed stars, then . . . variations in size, brightness and position should be random, as is the case for other fixed stars. But . . . “they seem to be arranged exactly in a straight line, parallel to the ecliptic, and to be brighter than the rest of the stars.” Therefore . . . the fixed stars hypothesis is contradicted. Or as Galileo put it, “yet they made me wonder somewhat.” So the point is that all observations are hypothesis/conception driven (i.e., theory laden). Those that are not puzzling are simply those that match our expectations (our predictions), whereas those that are puzzling do not match and may, if attended to, eventually result in a change (an accommodation) in those conceptions via If/then/Therefore reasoning. HOW GENERAL IS THE IF/THEN/THEREFORE PATTERN OF REASONING AND ARGUMENTATION? In a recent extensive review of argumentative frameworks, Sampson and Clark (2008) classified frameworks as domain general (i.e., those used to analyze arguments inside or outside the field of science such as Toulmin’s framework of claims, warrants, backings, etc.) or domain specific (i.e., those that focus on aspects of arguments specific to science or subfields such Zohar and Nemet’s framework, which focuses heavily on content and justification, or Kelly and Takao’s framework, which focuses on epistemic levels of specific propositions). Interestingly, Sampson and Clark classified the present If/then/Therefore framework (as discussed in Lawson, 2003) as domain specific. In their words, It fits the traditional empirical model of hypothesis testing and therefore might apply less well, for example, in terms of science conducted with archival data sets or observational contexts such as certain subfields of geology. As a result, the framework is very specific in terms of the scientific disciplines and contexts to which it applies, but for these disciplines and contexts it provides a strong structural model to guide instruction and student reasoning. (pp. 460 – 461) Science Education 1098237x, 2010, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sce.20357, Wiley Online Library on [16/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License 350 351 Is the If/then/Therefore framework really domain specific? Certainly, as we have seen in Galileo’s case, it can apply to situations in which circumstantial evidence is used to test hypotheses—where circumstantial evidence is defined as circumstances that according to common experience are usually linked to the hypothesized cause (e.g., as predicted on the basis of the orbiting-moons hypothesis, some nights the lights appeared to the east of Jupiter and some nights they appeared to the west. Furthermore, they always appeared along a straight line on either side of Jupiter). Indeed, the Grants’ data were also circumstantial in nature (e.g., G. conirostris did have an intermediate and particularly broad feeding niche). And as we have seen in the Nirenberg case, the If/then/Therefore framework can apply to situations in which scientists use experiments (i.e., manipulations of nature) to test hypotheses. However, Sampson and Clark ask: What about “science conducted with archival data sets” or “certain subfields of geology”? Can the framework apply here as well? And can it be found outside of science? To find out let us consider additional examples starting with historian Jared Diamond’s use of archival data to test hypotheses about the path of human history. Jared Diamond’s Use of Archival Data to Test Hypotheses About Human History In his Pulitzer Prize–winning book Guns, Germs, and Steel, Jared Diamond (1997) was puzzled by the way history unfolded on different continents. More specifically: “Why did wealth and power become distributed as they now are, rather than some other way? For instance, why weren’t Native Americans, Africans, and Aboriginal Australians the ones who decimated, subjugated, or exterminated Europeans and Asians” (p. 15)? Diamond advanced two hypotheses to answer these causal questions. The first hypothesis, the innate-intelligence hypothesis, claims that differences arose because of differences in innate intelligence among the races—that is, some people are innately smarter than others. Consequently, the smarter people developed the technology and so forth that made it possible for them to dominate. Thus, when cultures came into contact, the smarter, more technologically advanced people decimated, subjugated, or exterminated the less intelligent people. Alternatively, the second hypothesis, the environment hypothesis, claims that technological differences arose instead due to environmental differences. That is, the environment in which people settle dictates what sorts of technological advances are possible, thus determines which group develops technology and dominates if and when they meet. To test these alternatives, Diamond described a “natural experiment” concerning the settlement of Polynesia. Around 1200 BC, a group of people from the Bismarck Archipelago, north of New Guinea, finally reached and began colonizing the enormously diverse islands of Polynesia. By about AD 500, the colonization of the islands was mostly complete. As Diamond put it, “The ultimate ancestors of all modern Polynesian populations shared essentially the same culture, language, technology, and set of domesticated plants and animals. Hence Polynesian history constitutes an ‘experiment’ allowing us to study human adaptation. . . ” (p. 55). In other words, Diamond reasoned that because the environmentally diverse islands were all settled by the same ancestral group, technological differences that arose from island to island could not be attributed to ancestral differences in innate intelligence because innate intelligence was a variable that had been historically held constant. Instead, any differences that arose can be attributed to other variables such as the diverse environments. Accordingly, here is an explicit If/then/Therefore argument that appears to be behind Diamond’s use of these archival data to test the alternatives: Science Education 1098237x, 2010, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sce.20357, Wiley Online Library on [16/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License BASIC INFERENCES OF SCIENTIFIC REASONING LAWSON If . . . the environment hypothesis is correct, and . . . a group of people from the Bismarck Archipelago settle in the large and relatively favorable environment of New Zealand, while another group settles on the small and considerably less favorable environment of the Chatham Islands 500 miles to the west (planned test), then . . . technological advances should proceed faster and more fully in New Zealand than on the Chatham Islands. Additionally, and if and when the two groups come in contact, the New Zealanders should dominate the Chatham Islanders (deduced predictions). And . . . during the centuries following the settlement of New Zealand and the Chatham Islands, the two groups of settlers did in fact develop in opposite directions. The New Zealanders developed complex technology, political organization, and intense farming practices while the Chatham Islanders reverted to a loosely-coordinated hunting and gathering society. Further, in December 1835 when 500 armed men from New Zealand arrived on the Chatham Islands, they quickly killed or enslaved the Chatham Islanders in spite of the fact that they were vastly outnumbered (archival data). Therefore . . . the environment hypothesis is supported. Further, the innateintelligence hypothesis is contradicted because the identical ancestry of both groups of settlers predicts similar developmental paths (conclusion). If this account is reasonably accurate, we can conclude that the If/then/Therefore argumentative form is general enough to encompass the use of archival data. Note, however, whether this argument or any other If/then/Therefore argument should be considered retroductive or deductive depends on when the thinker became aware of the relevant archival data. Suppose, for example, Diamond generated the following argument before he was aware of the events of 1835: If . . . the environment hypothesis is correct, and . . . a group of people from a single location settle in a large and relatively favorable environment, while another group from the same location settle on a small and considerably less favorable environment (planned test), then . . . technological advances should proceed faster and more fully in the first group than in the second and when the two groups come in contact the first group should dominate the second (deduced predictions). Suppose Diamond next sifted through archival data to see whether he could find a specific case in point. If this was the order of things, then Diamond clearly used a deductive approach. If, however, Diamond first became aware of the 1835 killing and enslavement of the Chatham Islanders by the New Zealanders and only then employed the environment hypothesis to explain that “puzzling observation,” his argument would be retroductive. If so, he should then deduce similar results and look elsewhere in the archival record to see whether he can find them. Geological Discovery: What Killed the Dinosaurs? Recall that Sampson and Clark (2008) concluded that the If/then/Therefore form of argumentation is too specific to encompass all scientific contexts “. . . such as certain subfields of geology” (p. 461). Thus, let us turn to the research of geologist Walter Alvarez to see what sort of reasoning and forms of argumentation he used in drawing the conclusion that a giant meteor and its aftermath killed the dinosaurs some 65 million years ago (also see Lawson, 2004). The rapid near extinction of forams found in rock strata during the early 1970s presented a puzzling observation because it contradicted the longstanding uniformitarian doctrine that geologic and biologic changes occur gradually. Consequently, Alvarez (1997) began seeking a catastrophic cause. Alvarez was well aware of meteor impact craters on Earth, the Science Education 1098237x, 2010, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sce.20357, Wiley Online Library on [16/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License 352 353 best example being Meteor Crater in Northern Arizona, as well as the mounting evidence that the craters covering our moon and other planets were caused by impacting asteroids and comets. More important, such impact craters are the rule, not the exception. Alvarez was also aware of two published papers proposing that the dinosaur extinction had been caused by radiation triggered by the explosion of a nearby star—a supernova. By 1976, Alvarez began focusing his attention on the KT boundary layer (i.e., the narrow boundary between the Cretaceous and Tertiary layers). He suspected that the boundary held the key to the dinosaur extinction and that it could be used to test (via deduction) the more global uniformitarian versus catastrophic theories. As he put it: “Very rapid deposition of the clay would suggest a sudden cause for the extinction, but slow deposition would suggest a gradual mechanism.” (Alvarez, 1997, p. 61) How then could he find out how long it had taken to deposit the clay? What he needed was something that had been deposited in the limestone and clay at a constant rate. At this point, Alvarez enlisted the expertise of his father Luis Alvarez, a physicist at Berkeley. The elder Alvarez knew that although meteors hit the Earth rarely and at random, meteorite dust, which contains iridium, falls from outer space at a constant rate across the entire Earth. Therefore, they came up with a way to indirectly measure the clay’s deposition rate by measuring the amount of iridium. In other words, If . . . the extinction of many foram species, and possibly the dinosaurs, was caused by a catastrophic event (catastrophic-event hypothesis), and . . . the amount of iridium contained in the clay at the KT boundary layer is measured (imagined test), then . . . a relatively small amount of iridium should be present—about 0.1 parts per billion (ppb) (predicted result via deduction). Iridium falls at a constant rate, thus the less iridium in the layer, the less time it must have taken for deposition. And . . . thanks to Berkeley chemist Frank Asaro, by June of 1978 the initial iridium measurements had been made and they contained another surprise. Instead of the expected amount of 0.1 ppb, assuming the clay layer had been deposited slowly, a value of 9 ppb was detected (observed result). Therefore . . . either the extinction of many foram species, and possibly the dinosaurs, was not caused by a catastrophic event (conclusion via induction); or perhaps the catastrophic event itself deposited the unusually large amount of iridium (alternative hypothesis). Consider Alvarez’s reaction to the huge value of detected iridium: Where had all the iridium come from? Possibilities quickly sprang to mind: Could it have come from the supernova that Dale Russell and Wallace Tucker had suggested to explain the dinosaur extinction? Did it come from an impacting asteroid or comet? Or could there be a non-catastrophic explanation? Maybe the iridium was deposited from seawater somehow. Or maybe the Earth had encountered a cloud of interstellar dust and gas. (Alvarez, 1997, p. 69) Before investing time and energy in testing these possibilities (i.e., alternative hypotheses), Alvarez needed to know whether the iridium anomaly was restricted to the clay bed around Gubbio or whether it was a global phenomenon. So he went to the library in search of other known KT sites. At that time, the only other known site was a seaside cliff called Stevns Klint in Denmark. Thus, Alvarez set off to visit the Stevns Klint deposits. And on the basis of the following deductive/inductive argument, he concluded that what he found there supported the catastrophic-event hypothesis: If . . . the unusually large amount of iridium in the Gubbio clay layer was caused by a global catastrophic event (catastrophic-event hypothesis), and . . . the amount of iridium is measured in the other known KT boundary layer at Stevns Klint (imagined test), then Science Education 1098237x, 2010, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sce.20357, Wiley Online Library on [16/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License BASIC INFERENCES OF SCIENTIFIC REASONING LAWSON . . . an unusually high level of iridium should also be found in that layer (predicted result via deduction). And . . . when Alvarez visited the Stevns Klint deposits, he found that they also contained a narrow clay layer with an unusually high concentration of iridium (observed result). Therefore . . . the hypothesis was supported (conclusion via induction) and Alvarez decided that it was time to think about a global explanation for the anomaly. Thus, on the basis of this interpretation, we can conclude that the If/then/Therefore pattern of argumentation is general enough to encompass at least some geological research. We leave it to others to search for other geological cases that may or may not apply. Next let us briefly consider the so-called “thought experiments” (i.e., instances in which an entire “experiment” is conducted in one’s mind) to see whether the same argumentative form applies. Thought Experiments As the name implies, thought experiments take place in one’s thoughts. But this does not mean that they do not include observed results. They do. But the results of thought experiments have been observed before the experiment has been mentally conducted. Consequently, the point of a thought experiment is often to reveal via retroduction that the hypothesis in question must be wrong. It must be wrong because it leads either to a prediction that does not match with what we have already observed or to contradictory predictions. In this sense, thought experiments can also be cast in the form of If/then/Therefore arguments, which provides additional evidence of the form’s generality. For example, Galileo conceived of one of the most famous thought experiments in science. He wondered whether Aristotle’s claim that heavier objects fall faster than lighter objects was correct. (Actually, heavier objects typically do fall faster than lighter objects in air, but Galileo’s thought experiment was conducted in an idealized world devoid of fall-resisting air molecules.) As you will see, Galileo’s retroductive reasoning led to the conclusion that the mass must not matter because if it does, we end up with contradictory predictions. If . . . the rate of fall depends on the mass of the object, and . . . we drop a large, heavy rock next to a smaller, lighter rock, then . . . the larger, heavier rock should hit the ground first. Further, if . . . the rate of fall depends on the mass of the object, and . . . we now tie the two rocks together and drop them, then . . . the larger, heavier rock should fall faster. It should fall faster than before because it is now more massive (prediction). However, when the rocks are tied together and are falling, the lighter, slower falling rock will produce a drag on the heavier rock and slow it down. This implies that when tied together the rocks should fall more slowly (contradictory prediction). Therefore . . . we have two contradictory predictions implying that the rate of fall must not depend on the mass of the falling objects. Engineering and the Wright Brothers Invention of the Airplane Samarapungavan et al. (2006) proposed that research chemists have adopted what they call an “engineering” research model, as opposed to what they view as the more classic “hypothetico-deductive theory building model” (p. 470). Although their characterization of HD science shares little with the view advanced in this paper, their selection of an engineering research model to characterize chemical research is of interest in the sense that one might suspect that engineers employ the same If/then/Therefore reasoning pattern during the invention process that we are arguing is used during scientific discovery. In other words, in terms of reasoning, engineers and scientists may be doing the same thing. As Science Education 1098237x, 2010, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sce.20357, Wiley Online Library on [16/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License 354 355 a case in point, let us consider what happened in 1900 when bicycle builders Orville and Wilber Wright tried their hand at building an airplane (as cited in Crouch, 1992). The Wright brothers began by planning to build a small, unmanned glider. To calculate the size that the glider’s wings would need to develop the necessary lift for flight, they used an equation called the lift equation. According to the lift equation, the amount of lift created (L) depends on the total area of the lifting surface (S), the velocity of the flight squared divided by 2 (V 2 /2), a coefficient of air pressure (k), and a coefficient of lift (C L ), that is, L= kSV 2 2CL After using the lift equation to calculate the necessary specifications, the brothers built their glider to these specifications, and in October of 1900 took it to Kitty Hawk, North Carolina, for testing. The test results were encouraging enough to motivate them to build a larger manned glider and try it the next year. During that next year, the larger manned glider made several test flights—the longest 389 ft. However, the tests were largely discouraging because the manned glider failed to attain the needed lift for eventual self-propelled flight by some 20%. Thus, a puzzling observation provoked a causal question, namely: Why did not the manned glider attain the needed lift? Presumably, based on the following If/then/Therefore reasoning, they concluded that their failure to achieve the necessary lift was due to faulty specifications used in the lift equation, that is, If . . . the specifications used in the lift equation are correct, and . . . we build and fly a larger manned glider to those specifications and determine its amount of lift, then . . . it should achieve the needed lift for eventual self-propelled flight. But . . . when the manned glider was taken to Kitty Hawk and tested it did not attain the necessary lift—by some 20%. Therefore . . . the specifications are probably not correct. But what in the specifications contained the error? The Wright brothers hypothesized that the error likely existed in the coefficients used in their calculations (i.e., in the air pressure coefficient, in the lift coefficient, or in both). They then figured out a way to test their hypotheses by using a moving bicycle with a spare wheel, free to turn, mounted on its handlebars. To deduce the necessary prediction for their test, they used the lift equation and the two previously used coefficients to calculate that a wing with a surface area of 1 square foot, set at a 5◦ angle, should precisely balance a flat plate measuring 0.66 of a square foot, set at a 90◦ angle to the air flow. Consequently, to conduct the test, they mounted the spare wheel on the handlebars; they fixed the 1 square foot wing on the front of the spare wheel at a 5◦ angle; they fixed the 0.66 square foot flat plate on the spare wheel at a 90◦ angle to the air flow; and they rode the bicycle with its spare wheel, wing, and flat plate down the street. Based on their calculations, the forces created on the wing and on the flat plate should precisely balance each other and the spare wheel should not turn. However, when they rode down the street, the wheel turned. So the error could not be in the surface areas, the actual lift, or in the velocity. Therefore, they could be reasonably sure that the coefficients used in the calculations were in fact to blame. Their reasoning can be summarized using the If/then/Therefore form like this: If . . . no error exists in the two coefficients, and . . . a bicycle with the spare wheel, a wing, and a flat plate mounted as described above is ridden down the street, then . . . the forces exerted by the wind on the wing and on the flat plate should precisely balance and the spare Science Education 1098237x, 2010, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sce.20357, Wiley Online Library on [16/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License BASIC INFERENCES OF SCIENTIFIC REASONING LAWSON wheel should not turn. But . . . when the bicycle was ridden down the street, the spare wheel turned. Therefore . . . the hypothesis is contradicted. In other words, an error must exist in the coefficients. So the Wright brothers set out to find out which coefficient was to blame. To do this, they built a wind tunnel and used a small airfoil mounted on a balance to conduct several additional hypothesis-driven experiments that soon led to the construction of the first successful airplane in 1903. CONCLUSION AND IMPLICATIONS This paper has argued that the core of scientific reasoning, argumentation, and discovery consists of four inferences called abduction, retroduction, deduction, and induction. Abduction is first used to generate possible explanations for puzzling observations. Next, retroduction, deduction, and induction drive a pattern of If/then/Therefore reasoning used in the service of testing these explanations. Evidence in the form of several case histories has been presented that the inferences and the reasoning pattern are of general applicability. Recall Peirce’s previously quoted words: “In forty years diligent study of arguments, I have never found one which did not consist of these elements” (Bergman & Paavola, 1905/2003a, CP 8.209). Indeed, the present paper is another case in point (e.g., If . . . the present theory of the nature of scientific reasoning and argumentation is correct, and . . . several varied cases of scientific discovery are carefully analyzed, then . . . they should reveal the use of abduction, retroduction, deduction, and induction. And . . . several varied cases do reveal use of these inferences. Therefore . . . the theory is supported). Accordingly, science can be viewed as an enterprise in which explorations, which need not be consciously preceded by prior hypotheses/theories, yield puzzling observations in need of explanation. Scientists then subconsciously cull through their prior declarative knowledge to abductively generate one or more tentative explanations. This can happen very quickly or it can take several years. Either way, once generated, the tentative explanations are initially and subconsciously put to a retroductive test. Does the explanation in fact explain the initial puzzling observation? At passing such a test, one may incorrectly conclude that his or her task is complete. But it is not. Now, new tests should be imagined that lead deductively to predictions about possible new observations. Once such tests have been conducted and the new observations made, they need to be compared with the predictions. A good match inductively supports the tested explanation. A poor match inductively contradicts the explanation. Importantly, as mentioned (see footnote 2), the underlying reasoning is not strictly “logical” or “procedural” in the sense that prior declarative knowledge is needed, not only as a source of hypotheses but also as a source of imagined tests and predictions. John Platt (1964) expressed a similar view in his now classic paper “Strong Inference.” In that paper, Platt defined strong inference like this: Strong inference consists of applying the following steps to every problem in science, formally and explicitly and regularly: 1. Devising alternative hypotheses; 2. Devising a crucial experiment (or several of them), with alternative possible outcomes. Each of which will, as nearly as possible exclude one or more of the hypotheses; 3. Carrying out the experiment so as to get a clean result; 4. Recycling the procedure, making sub-hypotheses to refine the possibilities that remain; and so on. (p. 347) Science Education 1098237x, 2010, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sce.20357, Wiley Online Library on [16/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License 356 357 Although Platt noted that some research fields collectively embrace these steps, he also noted that the steps were neither universally understood nor applied. Again in his words, The difference between the average scientist’s informal methods and the methods of stronginference users is somewhat like the difference between a gasoline engine that fires occasionally and one that fires in steady sequence. If our automobile engines were as erratic as our deliberate intellectual efforts, most of us would not get home for supper. (p. 347) Therefore, although it would seem incorrect to argue that all scientific research is consciously guided by cycles of abductive, retroductive, deductive, and inductive inferences, it might, nevertheless, be argued that the odds of success would improve if they were consciously applied. Doing so, however, would require that researchers become more aware of their reasoning in successful instances so that they are better able to repeat this successful reasoning in subsequent instances. In cognitive terms, the consciousness issue appears to be one of “metacognition”—a term coined by Flavell (1979). Metacognition literally means thinking about one’s thinking, thus refers to an individual’s ability to stand apart of his or her own thinking, reflect on, and subsequently improve one’s thinking. Thus, in cognitive terms, what Platt is calling for is more reflectivity, more metacognition, on the part of researchers. Increased reflectivity would presumably lead to a greater awareness/consciousness of the reasoning process so that they might waste less time gathering irrelevant data and instead would more quickly move to the explicit generation and retroductive and then deductive tests of alternative hypotheses and predictions. This view is consistent with more recent so-called “dual-processing” accounts of reasoning and social cognition, which posit the existence of cognitive processes that are fast, automatic, and unconscious and those that are slow, deliberate, and conscious (e.g., Evans, 2008). Platt’s argument for raising consciousness among scientists can be applied in the science classroom as well. In short, students need to engage in more lessons in which they have opportunities to explore nature and confront puzzling observations and the resulting causal questions. They then need a skilled teacher who allows and encourages them to generate and test alternatives hypotheses and then reflect on what they have done, thus “exercise” and become more conscious of their nascent inferential skills. Many science educators have previously expressed a similar view with various degrees of explicitness. For example, Berland and Reiser (2009) recently explored the usefulness of a framework proposed by McNeill and Krajcik (2007). The McNeill–Krajcik framework contains these three components: (1) Claim—the answer to the question, the piece to be defended by evidence and reasoning; (2) Evidence—information or data that supports the claim; and (3) Reasoning—a justification that shows why the data count as evidence to support the claim. The NcNeill–Krajcik framework has elements in common with the present framework. However, it lacks some of the present framework’s explicitness and completeness. Also consider the Predict-Observe-Explain (POE) framework proposed by White and Gunstone (1992). During POE instruction students are first asked to predict the outcome of some sort of exploration or manipulation and then asked to justify their prediction. This is usually done in an area in which they are likely to generate a false prediction based on a misconception. Students then make the relevant observation, usually of a discrepant event that contradicts their prediction. Finally, they are asked to explain the discrepancy in an effort to change their misconception. Viewed in terms of the present theory, we can interpret a student’s justification, their misconception, as an alternative hypothesis that deductively generated their previously stated prediction. Thus, the subsequent observation, which does not match their prediction, contradicts their hypothesis and leads to the need to generate an alternative hypothesis (an alternative conception) that retroductively generates Science Education 1098237x, 2010, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sce.20357, Wiley Online Library on [16/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License BASIC INFERENCES OF SCIENTIFIC REASONING LAWSON a prediction that matches what they have just observed. The only elements missing from this POE framework (albeit three very important ones) are the need for students to then (1) plan some new tests of the alternatives based on deduction, (2) conduct the tests and compare the test results with their deductively derived predictions, and (3) use induction to conclude that the alternatives have been supported or contradicted, thus replace their prior misconception with a more scientifically acceptable one. Specifically, when put into practice, either in scientific research or in the science classroom, the present framework distinguishes among an argument’s declarative elements (i.e., puzzling observations, causal questions, hypotheses, planned tests, predictions, conducted tests, results, and conclusions) and its procedural elements (i.e., abduction, retroduction, deduction, and induction). Furthermore, the present framework details how the declarative and procedural elements interact in the following manner: (1) An exploration phase occurs in which a puzzling observation is made; (2) a causal question is raised; (3) a creative brainstorming phase occurs in which multiple hypotheses are abductively generated; (4) next, a phase occurs in which tests are planned that retroductively and later deductively lead to explicitly stated predictions; (5) evidence is then gathered that at least “in theory” might contradict each hypothesis; (6) predictions and evidence are compared to allow, via induction, the drawing of a conclusion; and (7) oral and/or written arguments are prepared and presented that include the evidence and the If/then/Therefore reasoning for and against each of the hypotheses. Unfortunately, in terms of implementing such lessons, a recent survey (Oehrtman & Lawson, 2008) found that a majority of experienced high school science teachers (63%) were unaware of the distinction between hypotheses and predictions. Perhaps, even worse, 41% of them failed to distinguish evidence from conclusions. Such lack of awareness is also common in instructional materials. For example, in a published set high school physical science lessons, Hsu (2005) defines a hypothesis as “A sentence describing what you think your experiment should demonstrate” (p. 9). And in a series of published high school general science lessons, Cothron, Giese, and Rezba (2006) offer this definition and example: “A hypothesis is a prediction of the effect that changes in the independent variable will have on the dependent variable. One possible hypothesis would be: If the amount of salt in the water is increased, then the water will evaporate more slowly” (p. 45). Failing to differentiate hypotheses from predictions in this way not only loses the “logic” of hypothesis testing but also loses the central goal of doing science, which is to generate and test explanations. Small wonder so many teachers and students are perplexed. Indeed, many, if not most, published lessons fail to begin with puzzling observations in need of explanation. For example, who among us has not seen a lesson similar to one recently sent to me by a curriculum developer from a nearby school district? The lesson begins with students observing different types of birdseed. Students are then asked to generate a hypothesis about which type they think birds would prefer and then test their hypothesis. Unfortunately, there is nothing to explain here—no puzzling observation and no causal question. Consequently, there is no need for hypotheses. At best, the lesson calls for students to make predictions about what type the birds might prefer. Most likely, students will have no idea why, or even whether, birds might prefer one type over another. Nevertheless, perhaps some accommodating student will make a prediction (recall White and Gunstone’s POE instructional framework). Having done so, the alert teacher can then ask the student to explain why the student made the prediction. If the student can then offer a possible reason for the prediction (e.g., I think birds will prefer the type containing lots of little yellow seeds because those seeds are easier to crack open), the teacher can identify such a reason as a hypothesis, which could subsequently be tested. Science Education 1098237x, 2010, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sce.20357, Wiley Online Library on [16/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License 358 359 Thus, more lessons are needed that begin with students making puzzling observations that can then be collaboratively and collectively explained via hypothesis generation and test. For example, Lawson (2002b) conducted such a lesson with college students who were challenged to generate and test multiple hypotheses about why water rose in a glass inverted over a burning candle that was standing in a pan of water. A key aspect of this and similar lessons is making sure that students freely generate several hypotheses. Initially, however, many students are reluctant to generate a hypothesis for fear of being wrong.6 This is particularly so if the teacher allows hypotheses to be “critiqued” following their generation. For example, students often generate the hypothesis that water rises because oxygen is being consumed by the flame and the resulting vacuum “sucks” the water up. At hearing such a hypothesis, an alert classmate having used retroductive reasoning may be tempted to exclaim: “That cannot be right. If it were right, then the water should stop rising after the candle goes out. But we saw that it doesn’t!” Or another equally alert classmate may retroductively add: “That cannot be right. If we were right, then we would have destroyed oxygen. But we know from chemistry class that combustion does not destroy oxygen. Instead it converts it into carbon dioxide.” Allowing these sorts of retroductive critiques during the hypothesis-generation phase of instruction severely restricts the number of alternatives abducted. Consequently, following good brainstorming techniques, retroductive arguments should be put on hold until students have generated all of the hypotheses they can think of. Only then should the teacher challenge students use both retroductive and deductive reasoning to test the alternatives. Students should also be told to try to test all of the generated hypotheses, not just the ones they think might be right. In short, to produce the strongest argument, their job should be one of not only finding evidence in favor of one hypothesis but also finding evidence against the alternatives. Teachers should also point out that the “correct” answer may be some combination of the generated hypotheses, or perhaps a hypothesis that has yet to be generated. Following much sharing of ideas, much experimentation, and much argumentation, some of the students who participated in the candle burning lesson described above were successful in constructing verbal and then written If/then/Therefore arguments summarizing how they had deductively tested each hypothesis and what conclusions they were able to draw, for example: If . . . the water rises because carbon dioxide molecules dissolve rapidly into the water (hypothesis), and . . . the height of water rise in two containers is compared—one with CO2 saturated water and one with normal water (planned test), then . . . the water should rise less in the container with the CO2 saturated water than in the container with the normal water (prediction). But . . . the water rises is the same in both containers (result). Therefore . . . the dissolving-CO2 hypothesis is probably wrong (conclusion). Success in conducting such a test and in constructing such an argument implies that these students reasoned in a context in which the hypothesized causal agent (dissolving CO2 molecules) was nonperceptible. Furthermore, to link the imagined causal agent to the experimental manipulation (i.e., the amount of dry ice in the two containers), the students presumably had to understand a theoretical rationale that goes something like this: 6 To encourage multiple hypothesis generation, teachers need to ask divergent, rather than convergent, questions. For example, students are much more willing to venture a “guess” if asked “What might have caused the water to rise?” as opposed to “What caused the water to rise?” Science Education 1098237x, 2010, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sce.20357, Wiley Online Library on [16/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License BASIC INFERENCES OF SCIENTIFIC REASONING LAWSON TABLE 2 Retroductive Arguments Constructed by Students While Attempting to Test Hypotheses (from Lawson, 2002b) Testing the dissolving-CO2 hypothesis If . . . the oxygen is converted to carbon dioxide, and . . . the carbon dioxide dissolves in the water, then . . . the inside pressure should be less than outside causing water to rise. And . . . the water did rise. Therefore . . . the hypothesis is correct due to rising of the water. Testing the expanding-water hypothesis If . . . water absorbs heat from the flame, and . . . that causes water to expand, then . . . we should see the water rise. And . . . it does. Therefore . . . the hypothesis is not disproved. Testing the consumed-oxygen hypothesis If . . . oxygen is consumed creating a partial vacuum, and . . . it causes a vacuum into which the water is sucked, then . . . the water level should rise, which it does. Therefore . . . the hypothesis is supported. Testing the phlogiston hypothesis If . . . the candle is lit before covering it with the jar, then . . . the water should rise when the flame (phlogiston) goes out, and . . . the water did rise. Therefore . . . the hypothesis is supported. Dissolving CO2 molecules presumably cause a reduction of air pressure in the cylinder. This reduction in turn causes the water rise. Consequently, when the water is already saturated with CO2 molecules, the newly created CO2 molecules cannot escape into the water, hence the internal pressure will not be reduced and the water will not rise. The theoretical rationale in this case is used to link the imagined causal agent (i.e., dissolved CO2 molecules) to the manipulated (i.e., independent) variable in the experiment (i.e., the amount of dry ice added to the two containers). Yet, several other students could do no better than generate retroductive arguments such as those listed in Table 2. The arguments in the table certainly suggest that these students failed to understand the limitations of retroductive reasoning and failed to appreciate the need for deductive tests with clearly stated predictions. Nevertheless, retroductive reasoning can be very important—recall the retroductive nature of thought experiments. Also consider Albert Einstein’s general relativity theory, which in 1907 retroductively explained the puzzling 43 arcseconds per century shift in Mercury’s orbit. Importantly, the theory also deductively predicted that starlight passing the sun would be displaced outward by 1.7 arcseconds—a prediction that was subsequently confirmed by astronomical observations made in 1919. When a graduate student later asked Einstein what he would have done had the observations (made by Sir Arthur Eddington) had shown his theory wrong, he replied: “Then I would have been sorry for the dear Lord (referring to Eddington); the theory is correct” (Isaacson, 2007, p. 259). Presumably, Einstein was speaking somewhat in jest. In fact, when subsequent observations made by Edwin Hubble in the 1920s contradicted another prediction of general relativity theory (i.e., the universe is not expanding), Einstein was quick to modify the theory to take Hubble’s result into account. Although in this instance, modification was relatively easy because Einstein’s original version of the theory had in fact predicted an expanding universe. But at the time the theory was generated, evidence implied a nonexpanding universe. Accordingly, Einstein added a “cosmological constant” to his field equations to keep the theory consistent with a static universe. Later, Einstein would call this addition “the biggest blunder he ever made in his life” (Isaacson, 2007, pp. 355–356). Science Education 1098237x, 2010, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sce.20357, Wiley Online Library on [16/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License 360 361 It was a huge blunder because had the constant not been added, the theory would have been left predicting an expanding universe—a prediction that would have been confirmed several years later making Einstein all the more famous. Interestingly, when the students who constructed the retroductive arguments listed in Table 2 were asked to generate and test hypotheses about the possible cause(s) of variation in the speed of a pendulum’s swing (i.e., What causes some pendulums to swing faster than others?), they had no problem in doing so and in later constructing deductive If/then/Therefore arguments like this one: If . . . the amount of weight causes changes in swing rates, and . . . the weights are varied while holding other possible causes constant, then . . . rate of pendulum swing should vary. But . . . when we conducted the experiment, we found that the rates did not vary. Therefore . . . the weight hypothesis is contradicted. Although this pattern of argumentation is the same as that used to deductively test the dissolving CO2 hypothesis, here a theoretical rationale is not needed because the test involves an experiment in which the possible cause is directly manipulated. In other words, the proposed cause is the amount of weight and the experiment’s independent variable also is the amount of weight. Importantly, this variable can be easily manipulated because weight differences can be sensed. Thus, causal hypothesis testing appears to occur on two qualitatively different levels, with success at testing hypotheses involving perceptible causal agents as a likely prerequisite for becoming proficient at testing hypotheses involving nonperceptible theoretical entities. Thus, students may first become generally skilled at testing hypotheses about perceptible causal agents. And, perhaps, only then, given the necessary developmental conditions, do they become generally skilled at testing hypotheses about nonperceptible causal agents (cf., Lawson et al., 2000). Consequently, teachers should provoke students to construct, reflect on, and then try to produce written arguments of what they have done in pendulum-like contexts before asking them to do so in contexts in which the hypothesized causal agents are nonperceptible. For example, consider the question that Sampson and Clark (2008) posed to students, that is, Why do some objects, such as a metal and a wooden spoon, feel like they are at different temperatures even though they have been sitting in the same room for several hours? Here, at least two levels of responses are possible. The first level can be provoked by first asking students to feel several objects and report which ones feel colder, warmer, and so on. Upon doing so, students will report that some objects (i.e., metal ones) feel colder than other objects (e.g., wooden ones). These observations raise a causal question: Why do metal objects feel colder than wooden objects? Students can then generate some alternative hypotheses: for example, metal objects feel colder because they are colder. They can test this hypothesis by measuring the temperatures of the objects in question: If . . . metal objects feel colder than wooden objects because they are colder, and . . . we measure the temperatures of the metal and the wooden objects, then . . . the measured temperatures of the metal objects should be lower. Of course the students’ results will contradict the hypothesis: that is, but . . . the temperatures of the metal and wooden objects are the same. Therefore . . . the hypothesis is contradicted. So the students will now have encountered a real puzzling observation, namely, some objects feel colder than others in spite of the fact that they are at the same temperature! This puzzling observation raises a second, higher level, causal question, to which students can again be asked to generate hypotheses. However, at least for the middle school students interviewed by Sampson and Clark, the sorts of hypotheses needed here and their means of testing are probably beyond their reach. Nevertheless, here is one hypothesis and a way Science Education 1098237x, 2010, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sce.20357, Wiley Online Library on [16/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License BASIC INFERENCES OF SCIENTIFIC REASONING LAWSON to test it: If . . . metal objects feel colder than wooden objects at any given temperature because the metals’ atoms are packed closer together—hence conduct heat better—hence feel colder, and . . . we measure and compare the densities of the objects, then . . . the metal objects should have greater densities than the wooden objects. Of course, upon measuring and comparing densities, the students will find that, as predicted, the metals are denser. Therefore . . . they can conclude that the hypothesis has been supported. Unfortunately, developing many such hypothesis-driven, inquiry-based lessons and properly matching the lessons’ intellectual demands with the students’ initial reasoning skills and their declarative knowledge remains an unmet educational challenge.7 A related unmet challenge is educating teachers so that they (1) understand the underlying patterns of reasoning and argumentation and (2) understand how best to teach such lessons so that students become better able to abductively generate and then test alternative hypotheses using retroduction, deduction, and induction. The author thanks John Alcock for several helpful comments during the preparation of the manuscript. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author and do not necessarily reflect the views of the National Science Foundation. REFERENCES Allchin, D. (2006). Lawson’s shoehorn—Reprise. Science & Education, 15, 113 – 120. Alters, B. J. (1997). Whose nature of science? Journal of Research in Science Teaching, 34(1), 39 – 55. Alvarez, W. (1997). T. rex and the crater of doom. Princeton, NJ: Princeton University Press. American Association for the Advancement of Science. (1989). Project 2061: Science for all Americans. Washington, DC: Author. American Association for the Advancement of Science. (2007). Atlas of scientific literacy (Vol. 2). Washington, DC: Author. Bergman, M., & Paavola, S. (Eds.). (2003a). The Commens dictionary of Peirce’s terms. Retrieved from http://www.helsinki.fi/science/commens/dictionary.html. (Reprinted from A letter to Calderoni, by C. S. Peirce, 1905) Berland, K. K., & Reiser, B. J. (2009). Making sense of argumentation. Science Education, 93(1), 26 – 55. Biela, A. (1993). Psychology of analogical inference. Stuttgart, Germany: S. Hirzel Verlag. Bonner, J. J. (2005). Which scientific method should we teach & when? The American Biology Teacher, 67(5), 262 – 264. Brannigan, A. (1981). The social basis of scientific discoveries. Cambridge, England: Cambridge University Press. Collins, H. M. (1985). Changing order. London: Sage. Cothron, J. H., Giese, R. N., & Rezba, R. J. (2006). Students and research. Dubuque, IA: Kendall/Hunt. Crick, F. H. C., Barnett, F. R. S. L., Brenner, S., & Watts-Tobin, R. J. (1962). General nature of the genetic code for proteins. Nature, 192, 1227 – 1232. Crouch, T. D. (1992). Why Wilber and Orville? Some thoughts on the Wright brothers and the process of invention. In R. J. Weber & D. N. Perkins (Eds.), Inventive minds (pp. 80 – 96). New York: Oxford University Press. Darwin, C. (1898). The origin of species (7th ed.). New York: Appleton & Company. Diamond, J. (1997). Guns, germs, and steel. New York: Norton. Educational Policies Commission. (1961). The central purpose of American education. Washington, DC: National Education Association of the United States. 7 In terms of learning cycle instruction, lessons in which students generate and deductively test alternative hypotheses have been called hypothetico-deductive or hypothetical-predictive learning cycles (e.g., Lawson, 1995, 2009b; Lawson, Abraham, & Renner, 1989). Learning cycles in which students explore nature and simply identify patterns and/or make puzzling observations without generating possible explanations have been called descriptive learning cycles. And learning cycles in which students confront puzzling observations and generate possible explanations, but test them only with previously gathered data, have been called empirical-abductive (i.e., retroductive) learning cycles Thus, the three types of learning cycles represent segments along a continuum from descriptive to experimental science. As such, they place differing demands on student initiative, knowledge, and reasoning skill. Science Education 1098237x, 2010, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sce.20357, Wiley Online Library on [16/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License 362 363 Educational Policies Commission. (1966). Education and the spirit of science. Washington, DC: National Education Association of the United States. Erduran, S., Simon, S., & Osborne, J. (2004). TAPping into argumentation: Developments in the application of Toulmin’s argument pattern for studying science discourse. Science Education, 88(6), 915 – 933. Evans, J. S. (2008). Dual-processing accounts of reasoning, judgment and social cognition. Annual Review of Psychology, 59, 255 – 278. Finke, R. A., Ward, T. B., & Smith, S. M. (1992). Creative cognition: Theory research and practice. Cambridge, MA: The MIT Press. Flavell, J. H. (1979). Metacognition and cognitive monitoring: A new area of cognitive-developmental inquiry. American Psychologist, 34, 306 – 326. Gentner, D. (1989). The mechanisms of analogical learning. In S. Vosniadou & A. Ortony (Eds.), Similarity and analogical reasoning. Cambridge, England: Cambridge University Press. Giere, R. N., Bickle, J., & Mauldin, R. F. (2006). Understanding scientific reasoning (5th ed.). Belmont, CA: Thomson Higher Education. Grant, P. R. (1986). Ecology and evolution of Darwin’s finches. Princeton, NJ: Princeton University Press. Grant, B. R., & Grant, P. R. (1989). Evolutionary dynamics of a natural population: The large cactus finch of the Galapagos. Chicago: University of Chicago Press. Hanson, N. R. (1958). Patterns of discovery. London: Cambridge University Press. Hempel, C. (1966). Philosophy of natural science. Upper Saddle River, NJ: Prentice-Hall. Holyoak, K. J. (2005). Analogy. In K. J. Holyoak & R. G. Morrison (Eds.), The Cambridge handbook of thinking and reasoning (pp. 117 – 142). New York: Cambridge University Press. Hsu. T. (2005). Foundations of physical science investigations (2nd ed.). Peabody, MA: CPO Science. Isaacson, W. (2007). Einstein: His life and universe. New York: Simon & Schuster. Koestler, A. (1964). The act of creation. London: Hutchinson. Lawson, A. E. (1995). Science teaching and the development of thinking, Belmont, CA: Wadsworth. Lawson, A. E. (2002a). What does Galileo’s discovery of Jupiter’s moons tell us about the process of scientific discovery? Science & Education, 11(1), 1 – 24. Lawson, A. E. (2002b). Sound and faulty arguments generated by pre-service biology teachers when testing hypotheses involving un-observable entities. Journal of Research in Science Teaching, 39(3), 237 – 252. Lawson, A. E. (2003). The nature and development of hypothetico-predictive argumentation with implications for science teaching. International Journal of Science Education, 25(11), 1387 – 1408. Lawson, A. E. (2004). T. rex, the crater of doom, and the nature of scientific discovery. Science & Education, 13, 155 – 177. Lawson, A. E. (2005). What is the role of induction and deduction in reasoning and scientific inquiry? Journal of Research in Science Teaching, 42(6), 716 – 740. Lawson, A. E. (2006a). Allchin’s errors and misrepresentations and the H-D nature of science. Science Education, 90(2), 289 – 292. Lawson, A. E. (2006b). Developing scientific reasoning patterns in college biology. In J. J. Mintzes & W. H. Leonard (Eds.), Handbook of college science teaching: Theory, research, and practice (pp. 109 – 118). Washington, DC: National Science Teachers Association. Lawson, A. E. (2009a). On the hypothetico-deductive nature of science—Darwin’s finches. Science & Education, 18(1), 119 – 124. Lawson, A. E. (2009b). Teaching inquiry science in middle and secondary schools. Thousand Oaks, CA: Sage. Lawson, A. E., Abraham, M. R., & Renner, J. W. (1989). A theory of instruction: Using the learning cycle to teach science concepts and thinking skills. Cincinnati, OH: National Association for Research in Science Teaching. Lawson, A. E., Clark, B., Cramer-Meldrum, E., Falconer, K. A., Kwon, Y. J., & Sequist, J. M. (2000). The development of reasoning skills in college biology: Do two levels of general hypothesis-testing skills exist? Journal of Research in Science Teaching, 37(1), 81 – 101. Lawson, D. I., & Lawson, A. E. (1993). Neural principles of memory and a neural theory of analogical insight. Journal of Research in Science Teaching, 30(10), 1327 – 1348. Mahootian, F., & Eastman, T. E. (in press). Complimentary frameworks of scientific inquiry: Hypotheticodeductive, hypothetico-inductive, and observational inductive. World Futures. The Journal of General Evolution. McNeill, K. L., & Krajcik, J. (2007). Middle school students’ use of appropriate and inappropriate evidence in writing scientific explanations. In M. C. Lovett & P. Shah (Eds.), Thinking with data: The proceedings of the 33rd Carnegie Symposium on Cognition (pp. 233 – 265). Mahwah, NJ: Erlbaum. Misak, C. (2004). Charles Sanders Peirce (1839 – 1914). In C. Misak (Ed.), The Cambridge companion to Peirce. Cambridge, England: Cambridge University Press. National Research Council. (1990). Fulfilling the promise: Biology education in the nation’s schools. Washington DC: National Academies Press. Science Education 1098237x, 2010, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sce.20357, Wiley Online Library on [16/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License BASIC INFERENCES OF SCIENTIFIC REASONING LAWSON National Research Council. (1996). National Science Education Standards. Washington, DC: National Academies Press. National Research Council. (2001). Educating teachers of science, mathematics, and technology. Washington, DC: National Academies Press. Newton, P., Driver, R., & Osborne, J. (1999). The place of argumentation in the pedagogy of school science. International Journal of Science Education, 21, 553 – 576. Nirenberg, M. W., & Matthaei, J. H. (1961). The dependence of cell-free protein synthesis in E. coli upon naturally occurring or synthetic polyribonucleotides. Proceedings of the National Academy of Sciences of the United States of America, 47(10), 1580 – 1588. Oehrtman, M., & Lawson, A. E. (2008). Connecting science and mathematics: The nature of proof and disproof in science and mathematics. International Journal of Science and Mathematics Education, 6(2), 377 – 403. Planck, M. (1949). Scientific autobiography (E. Guynor, Trans.). New York: Philosophical Library. Platt, J. R. (1964). Strong inference. Science, 146, 347 – 353. Polya, G. (1954). Patterns of plausible inference. Princeton, NJ: Princeton University Press. Popper, K. (1965). Conjectures and refutations: The growth of scientific knowledge. New York: Basic Books. Samarapungavan, A., Westby, E. L., & Bodner, G. M. (2006). Contextual epistemic development in science: A comparison of chemistry students and research chemists. Science Education, 90(3), 468 – 495. Sampson, V., & Clark, D. B. (2008). Assessment of the ways students generate arguments in science education: Current perspectives and recommendations for future directions. Science Education, 92(3), 447 – 472. Schick, T. S., Jr., & Vaughn, L. (1995). How to think about weird things. Mountain View, CA: Mayfield. Shapley, H., Rapport, S., & Wright, H. (Eds.). (1954). A treasury of science. New York: Harper & Brothers. (Reprinted from The sidereal messenger, by G. Galilei, 1610) Sternberg, R. J., & Davidson, J. E. (Eds.) (1995). The nature of insight. Cambridge, MA: The MIT Press. Tidman, P., & Kahane, H. (2003). Logic and philosophy (9th ed.). Belmont, CA: Wadsworth/Thomson. Toulmin, S. (1969). The uses of argument. Cambridge, England: Cambridge University Press. Turrisi, P. A. (Ed.). (1997). Pragmatism as a principle and method of right thinking. The 1903 Harvard lectures on pragmatism. Albany: State University of New York Press. (Reprinted from C. S. Peirce, 1903; see also The Commens dictionary of Peirce’s terms, by M. Bergman & S. Paavola, Eds., 2003a, 2003b. Retrieved May 18, 2009, from http://www.helsinki.fi/science/commens/dictionary.html) Westerland, J., & Fairbanks, D. (2004). Gregor Mendel and “myth-conceptions.” Science Education, 88, 754 – 758. White, R. & Gunstone, R. (1992). Probing understanding. London: Falmer Press. Wivagg, D., & Allchin, D. (2002). The dogma of “the” scientific method. The American Biology Teacher, 64(9), 645 – 646. Woodward, J., & Goodstein, D. (1996). Conduct, misconduct and the structure of science. American Scientist, 84, 479 – 490. Science Education 1098237x, 2010, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sce.20357, Wiley Online Library on [16/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License 364
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )