APPLIED COGNITIVE PSYCHOLOGY, VOL. 10, S99-Sl12 (1996) Why People are Not Like Marbles in an Urn: An Effect of Context on Statistical Reasoning DANIEL L. SCHWARTZ AND SUSAN R. GOLDMAN The Learning Technology Center, Vanderbilt University SUMMARY A large body of research has examined the effect of contextual knowledge on deductive reasoning. Relatively little work, however, has examined context effects on statistical reasoning. In this paper, we document that in a context such as drawing marbles from an urn, children correctly think of sampling as a way to measure the distribution of marbles. However, in other contexts, such as taking a survey of people’s opinions, children design samples that have the effect of causing a distribution. For example, they sample members of the population most likely to have positive opinions. We interpret these results by proposing that knowledge of statistics comes in discrete pieces of intuitive understanding whose elicitation is contingent upon the problem context. We describe a model of instruction that acknowledges the effects of context on statistical reasoning. The ability to reason about statistical data has become an important component of numerical literacy. Statistical information, once primarily in the purview of analysts, appears regularly in the mass media in the form of polls and surveys. Reflecting the increased importance of statistical reasoning, the National Council of Teachers of Mathematics (1989) proposed that a corridor of statistical instruction begin as early as upper elementary school. Statistics, however, presents a challenging domain in which to develop mathematical reasoning. Ample literature documents people’s misconceptions about statistical principles (e.g., Bar-Hillel, 1980; Kahneman, Slovic and Tversky, 1982; Konold, 1989). Even when statistical principles are understood, people often fail to apply them (Nisbett, Krantz, Jepson and Kunda, 1983). We suggest that one source of people’s fallibility is that the context of a statistical problem has a powerful and identifiable influence on the assumptions people make about problem-relevant reasoning principles. We report an experiment that examines this claim by testing adolescent’s understanding of sampling in two different contexts. Address correspondence to Daniel L. Schwartz, Learning Technology Center, Box 45, GPC, Vanderbilt University, Nashville, T N 37203, USA. We thank Nancy Vye, John Bransford, Joyce Moore and Taylor Martin for their conceptual, enabling, and editorial contributionsto this work. This research was supported by a grant from the National Science Foundation (NSF MDR-9252908). CCC 08884080/96/S10099-14 0 1996 by John Wiley & Sons, Ltd. Accepted 2 July 1996 SlOO D. L. Schwartz and S. R. Goldman CONTEXT EFFECTS IN STATISTICAL REASONING The effect of context on people’s ability to reason deductively has been demonstrated numerous times (e.g., Cheng, Holyoak, Nisbett and Oliver, 1986;Cole and Scribner, 1974; Cummins, 1995; Donaldson, 1978; Johnson-Laird, Legrenzi and Legrenzi, 1972),but less so for statistical reasoning (Garfield and Ahlgren, 1988; Shaughnessy, 1992). A brief thought experiment, however, can demonstrate the power of context on statistical reasoning. In a study by Tversky and Kahneman (1983), subjects received the profile of 3 1-year-old Linda who was described as single, outspoken, bright, philosophical, and deeply concerned about issues of social equity. The subjects had to decide which of two alternatives were more probable: (a) Linda is a bank teller; or (b) Linda is a bank teller and is active in the feminist movement. Most subjects incorrectly chose option (b). To see why this is the incorrect answer, consider another scenario that was not included in the Tversky and Kahneman study. There is an urn filled with poker chps and marbles of various colours. If one item is pulled from the urn, whch alternative is more probable: (a) the item would be a marble; or (b) the item would be a blue marble. Unlike the Linda problem, we suspect that most people would correctly choose option (a) in this scenario. They would not make the mistake of viewing a conjoint event (i.e., blue and marble) as more likely than one of the events alone (i.e., a marble). Yet, this is exactly the mistake that the subjects made by choosing the ‘banker and feminist’ option in the context of the Linda study. Why would these two contexts have different effects on people’s inclination to reason statistically? There are two answers to this question. The first is that people do not normally receive any instruction that helps them learn to apply statistical reasoning to everyday contexts like the Linda scenario. Probability instruction usually relies on explicit chance devices (e.g., the urn context). Students do not have an opportunity to consider probability in other, less obviously chance-based situations. Similarly, statistics instruction often emphasizes properties of numerical distributions (e.g., mean and mode), but takes for granted that students know how these distributions were generated. Students have little occasion to think about how events, behaviours, or opinions are randomly sampled from a context to create a distribution for statistical analysis. Therefore, one reason that people may not apply statistical reasoning when they should is that they have not learned how to turn an everyday situation into a statistical one (cf. Agnoli and Krantz, 1989; Nisbett et al., 1983). The second answer to this question, the one that suggests that instruction may be necessary, is that people are predisposed to treat contexts involving people, such as the Linda example, differentlythan contexts involving marbles in an urn. This predisposition exists because of the intuitively-based understanding that certain properties of people ‘cause’ their actions, opinions and decisions. This leads to a tendency to think about sampling people in causal terms rather than in chance or random terms. No such ‘causal’ understanding exists for marbles in an urn so people reason about sampling marbles in chance terms. The following section explains this idea more fully. CONFUSING THE ROLES OF CAUSE AND CHANCE IN PRODUCING DISTRIBUTIONS To examine the effects of context on statistical reasoning, we begin by considering where contexts and statistical reasoning meet and why it might be difficult for people Context Eflects in Statistical Reasoning SlOl to know how to turn an everyday situation into a statistical one. Measurement is the bridge between the features of an everyday context and the numbers of a mathematical analysis. For a statistical analysis involving people it is necessary to convert people and/or their properties into a distribution of numerical frequencies. This is accomplished by measuring the distribution of people and/or properties. Sampiing is a method for measuring a distribution of people, events, properties, opinions, and so forth. In a psychology experiment on confidence, for example, people report a confidence rating, and the researcher measures the distribution of people’s confidences through a sample. Notice that in this situation there are two measurements, and this may create a space for confusion: people should think of a confidence score as being caused by some property of the person or the experiment (e.g., studying causes high confidence), but at the same time they should think of the sample that is measured in the experiment (e.g., the distribution) as resulting from a random selection process. To successfully turn a people situation into a statistical one, people must navigate through the complexity that people generate or ‘cause’the outcomes of interest, but that the way these outcomes are collected must not, in and of itself, produce the distribution of outcomes. That is, the sample must be randomly drawn to avoid a biased distribution. For a sample to yield valid inferences about the distribution of the population as a whole, the researcher must be able to say that chance governed the selection of the sample. This means that to reason statistically, cause and chance must often be considered simultaneously. Some researchers propose that people view chance as a factor that prevents perfect prediction (e.g., Bar-Hillel, 1980; Konold, 1989; Kuzmak and Gelman, 1986). Similarly, others propose that people learn of chance as the antithesis of causality (e.g., Owens, 1992; Piaget and Inhelder, 1975). We additionally propose that people may have trouble assigning complementary roles to the causal and chance elements within a statistical inference, as would be necessary when using a random sample to test for causal relations. In particular people may tend to view a sampling method in an everyday context as a way to cause a distribution, rather than as a way to measure the distribution of the total population. This tendency may be particularly strong in opinion polling contexts because of intuitively-based beliefs that particular properties of people ‘cause’ their opinions. In everyday situations people have a tendency to over-attribute causality (Cummins, 1995; Kelley, 1973). People often assume causal relations solely on the basis of co-occurrence or temporal association (Goldman, 1985). It is as though people bring an assumption that events should be explained causally, and they search for covariances between events and/or properties that can support this form of causal explanation. We will call this focus on causal association the covariance assumption. In the following study, we develop evidence that the covariance assumption leads children to think of sampling in the context of a survey in a fashion that makes their sample methods cause a distribution, even though they can think of sampling in random terms in contexts that do not lend themselves to causal attribution. The idea here is not that children necessarily think of a survey as a way to find covariances, although they often do. Rather, the availability of covariances in a statistical situation leads children to design sampling methods that ultimately cause the distribution of outcomes. To further see how the covariance assumption may lead to confusions, we decompose the structure of a statistical inference. Consider the three representations S102 D. L. Schwartz and S. R. Goldman of a statistical inference shown in Figure I . The top panel represents a typical chance setup such as pulling marbles from an urn. The left circle represents the population of marbles from which the sample is drawn. The heavy arrow in the left of the panel represents the sample procedure; in this case, the procedure consists of reaching into the urn and blindly pulling marbles. The rounded box represents the resulting sample. The dashed arrow represents the inference from the sample back to an Population Sample i Inferred Po pula t ion ->y -> X -> Z xyxz xyxz x y x z.... SURVEY SETUP W (people and their opinions) zzxyzxy xyxz xyxz x y x z.... SURVEY in CHANCE SETUP Figure 1. Why a survey is psychologically unlike drawing marbles from an urn. In a survey, there is a potential covariant relationship between people’s characterstics and their opinions (x, y and z). In a chance setup, such as pulling coloured marbles from an urn, one would not normally say that a red marble covaries with a red marble. Context Effects in Statistical Reasoning S103 estimate of the population. The inferred population distribution is represented in the circle on the right. Next consider a statistical inference in the context of a survey, The middle panel captures a common sense interpretation of a survey. In the left circle, there is a population of people. As before, the heavy arrow indicates the selection of the sample. In this case, however, one samples individuals from a population who then generate a sample of opinions. This is represented in the centre box with the small arrows indicating a relationship between the chosen individuals and their opinions, x, y and z. The sample of opinions may then be used to infer the distribution of opinions within the original population. When viewing a survey this way, one takes two measurements: a sample of population characteristics and a measurement of how individuals in the sample respond to the survey questions. Panel three shows a less intuitive way to view a survey. Here, similar to the chance setup, one samples from a population of opinions to make an inference about a population of opinions. In the chance setup, the sampled population, the sample, and the inferred population involve the same types of entities: red marble-+red marble-red marble. There is no explanation for the fact that a red marble ‘leads to’ a red marble. In contrast, in the survey setup, there are two distinct entities, people and opinions. The categorical difference between people and opinions may affect one’s interpretation of sampling. One can easily wonder whether something about a person ‘leads to’ opinion x. More generally, one might view the people in panel 2 as reflecting identifiable traits within the population such as age, gender or political affiliation. Consequently, one might ask whether a particular trait (e.g., gender) is associated with opinion x. Or, one might ask whether all the classes of people who have different opinions are represented in the sample (e.g., the figure with the baseball cap is left out). We hypothesize that the covariance assumption leads people to interpret a survey in terms of panel 2. This draws people to focus on the association between particular traits within a population and particular opinions. This is a natural tendency; people try to associate traits with outcomes. However, for people unschooled in sampling, this focus on population traits may cause problems when they think about taking a sample. Their sampling procedure, for example, may select people on the basis of particular traits, and without satisfactory knowledge of stratification techniques, this can bias or cause the distribution of outcomes (or opinions). A TEST OF THE EFFECTS OF CONTEXT ON THE COVARIANCE ASSUMPTION In the following study we examined whether context influences children’s statistical reasoning. This study differs from prior work on children’s statistical reasoning. That work primarily used materials in which the sampling procedure has a limited causal interpretation (e.g., Dean, 1987; Yost, Siege1 and Andrews, 1962). Activities like rolling a die, twirling a spinner, or drawing a marble from an urn offer limited opportunites for a causal explanation of the distribution of outcomes. It would be highly unlikely, for example, for someone to reason that a red marble causes its redness. These chance setups have provided important information about children’s probabilistic reasoning (e.g., Fischbein, 1975; Gal, Rothschild and Wagner, 1990; S104 D . L. Schwartz and S. R . Goldman Rubin, Bruce and Tenney, 1990). This information, however, does not explain how children’s understanding of probabilistic inference evolves to handle more everyday statistical situations. An understanding of the relationship between sample and population in chance setups may not naturally lead to an understanding of more everyday situations like taking a survey. Unlike the case of drawing marbles from an urn, for example, the opinions individuals hold are often attributed to them because of a personal characteristic such as age. If our hypothesis of a covariance asumption is right, the context of an opinion survey should lead children, and perhaps adults as well, to take these characteristics into account when developing sampling methods, thereby causing a distribution of outcomes. To examine whether adolescents bring a covariance assumption to their statistical reasoning, we asked them to design samples for two scenarios. The fun booth scenario examined sampling ideas in the context of potential covariant relations. The children designed sampling methods to estimate the participation rate for a fun booth at a school fair. The gender scenario examined the children’s ideas in the context of non-covariant relations. In this scenario the children designed sampling methods to estimate how many boys and girls were at their school. If there is a covariance assumption, the fun booth scenario should yield sample selection methods based on population attributes that are relevant to participation in the fun booth. They might, for example, sample students on a baseball team if the fun booth involves a baseball toss. In terms of Figure 1, they might want to sample the figure with the baseball hat. For the gender scenario, however, we expected random selection methods because the sampled entity and the property of interest are the same (e.g., male+male). Accordingly, we coded the children’s sampling methods as to whether they were based on attributes of the population that were inference relevant, or whether they were random methods that selected individuals blind to their characteristics. Methods Subjects Fifteen children were randomly selected from two 6th-grade classrooms to participate in the study. The children had previously studied probability in the context of dice. Design The design was within-subject; each child proposed sampling methods for both scenarios. Eight children completed the gender scenario first, and seven children completed the fun booth scenario first. There was no effect of problem order so it will not be considered further. The dependent measures were the number of types of inference-relevant and random selection methods they generated for each scenario. Table 1 shows the coding scheme more fully. Procedure The children met with an experimenter individually. The session was videotaped for purposes of later review. For the fun booth scenario, the children were told to imagine that they were going to have a booth of their choosing at a school fair. They were asked what booth they would have. (Answers ranged from ball toss to balloon Put up a sign, and whoever wants to come can fill it out Ask all the baseball players I'd ask my friends, 'cause they would come I'd check 25 boys and 25 girls Ask 50 teachers to guess how many boys and girls I'd ask everyone Inference-relevant Self-selection Likely to come Friends Fair split Other Repub 1ic Democracy + =fun booth method; o =gender method; x =method for both. Draw 50 names from a hat Give it to two teachers to hand to their classes You just have to pick 'em without looking Just stand in the hall and count the first SO kids Example response Random Explicit device Class teacher Just pick 'em First to come by Sampling method 0 0 0 + + 0 0 + + 0 0 X o o o X + + + + o + + + x 0 + x 0 x + + x 0 + 0 + 0 + 0 + 0 0 Students sorted by prediction fit 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 S 0 0 4 + 3 0 2 0 1 Table 1. Classes of sampling methods proposed by each student for each scenario s.' S106 D. L. Schwartz and S. R. Goldman popping to fishing-for-prizes booths.) They were told that everyone who went to their booth would get a prize. Therefore, they needed an estimate of how many children would come to their booth so they could estimate how many prizes to prepurchase. They were told that they were not allowed to ask everybody at the school. Instead, they could only survey 50 of the total 400 students who would go to the fair. To ensure the students understood the purpose of a sample, the experimenter explained how the results of the sample could be extrapolated to estimate how many students from their school would come to the booth. For the gender scenario, the children were told that their task was to estimate how many boys and girls were at their school. As with the booth scenario, the experimenter explained that they would have to take a survey of 50 students and then extrapolate to the population of 400 students at their school. After hearing a scenario, the children were prompted to generate as many sampling methods as they could think of. The experimenter moved on to the second scenario (or finished the experiment) when the children offered no new sampling methods after two prompts to continue. Results and discussion The leading hypothesis of this study was that the presence of potential covariant relations in the fun booth scenario (e.g., baseball players would like a ball toss booth) would lead children to generate more inference-relevant selection methods compared to the gender scenario. The results supported this hypothesis. A primary coder separated the children's sampling methods into inference-relevant,random and other sampling methods. The unifying characteristic of the inference-relevant selection methods was that individuals would be selected on the basis of a hypothesis about how they would respond to the survey. The unifying characteristic of the random selection methods was that individuals would not be chosen on the basis of inference-relevant attributes. To provide a finer-grained analysis, the coder also developed the sub-categorizations described in Table 1. The table includes two other, non-sampling methods that were not focal for the current hypothesis, but will be discussed below. After developing the coding scheme, the primary coder categorized each child's sampling methods for each scenario. A second coder, who was informed about the categorization scheme but who was blind to the hypothesis, had 100% agreement on the primary classification of inference-relevant, random or other sampling method and had 94% agreement on the sub-categorizations. Disagreements were resolved through negotiation. The number of types of inference-relevant selection methods and the number of types of random selection methods served as dependent measures in a multivariate analysis of variance with type of scenario serving as the independent variable.' The scenario by sampling method interaction was significant, F(1,14) = 8.26,M S e = 0.52, p < 0.01. Students provided more random selection methods for the gender scenario ( M = 1.2, SD=O.7) than the fun booth scenario (M=0.5, SD=O.7) but more inference-relevant methods for the fun booth scenario ( M = 1.O, S D = 0.5) than the gender scenario ( M =0.6, S D = 0.7). Thirteen children mentioned at least one random and one inference-relevant selection method in their responses to the two 'Number of types rather than total number was used because we were interested in different ways students thought about sampling in the two problem contexts. Being able to provide several examples of the same way of thinking about sampling is not informative to the hypotheses of the study. Context Eflects in Statistical Reasoning S107 scenarios. If the students understood the importance of drawing a random sample in all contexts, then they should have offered at least one random method for each context, regardless of all other types of responses. Only 40% of the children offered random selection methods for both scenarios. Demonstrating the powerful effect of context, the children tended to use random methods for the gender scenario exclusively and inference-relevant methods for the fun booth scenario exclusively. Table 1 shows that 47% of the students used inference-relevant methods exclusively for the fun booth context, x2( 1) = 4.5, p <0.05, whereas 53% of the students used random methods exclusively for the gender context, x2( 1) = 8.0, p <0.01. Although many children knew enough about sampling to use a random selection method for the gender scenario, they did not think to apply this technique to the fun booth scenario. Our interpretation of this finding is that the potential covariant relationships in the fun booth scenario (e.g., a baseball player is more likely to come to a ball toss booth) led to selection methods based on population attributes relevant to whether children would come to the booth. In contrast, the gender scenario does not have potential covariant relations. As a consequence, it elicited random selection methods that were blind to the characteristics of the individuals chosen to be in the sample. In addition to the differences in sampling methods across the scenarios, there was a high level of within-subject variability within a scenario. For the gender scenario, for example, 26% of the children suggested both a random and a fair split selection method (i.e., ask 25 boys and 25 girls). At one moment, many of the children seemed to grasp the assumptions and purposes of sampling, and at the next, they seemed to lose them, even in the non-covariant, gender scenario. In the context of the fun booth survey, children easily lost sight of a sample as a way to find information about the population at large. The variability within individuals indicates that the children did not have stable heuristics or principles underlying their selection of sampling methods. Below, we develop a model of the children’s understanding that may explain the apparent inconsistencies in their reasoning. One alternative interpretation of the current results is that the children did not have any prior experience for understanding the inference from a sample to a population in surveys. They may have ‘misheard’ the fun booth scenario and thought that their task was to make sure that the booth had a high level of participation. As we develop below, this interpretation is in part correct in that children of this age use more familiar schemas to make sense of a survey. They, for example, will often construe a survey as a way to advertise. However, we do not think the current results are simply due to the fact that these children may have had limited exposure to surveys. We have found that even when children have had exposure to surveys and have solved several problems that required extrapolating from a survey to a population, they still tend to bring a covariance assumption to their understanding of sampling methods (Schwartz, Goldman, Vye, Barron and Cognition and Technology Group at Vanderbilt, in press). Moreover, we conducted a small follow-up study with 24 college sophomores who presumably would have had more exposure to surveys. The students designed a survey to investigate the hypothesis that fast food is correlated with being overweight. Out of the 24 students, 19 designed surveys that ignored baseline by proposing that the surveys be given out exclusively at a fast-food restaurant (i.e., they neglected the distribution of weights among people who do not eat fast food). This tentatively suggests a covariance S108 D. L. Schwartz and S. R. Goldman assumption in that the sophomores chose to sample the population on the basis of an inference-relevant attribute, namely, the individuals’ presence at a fast-food restaurant. Much like the 6th-graders who chose to sample people who they thought would come to the fun booth, the college students chose to sample people who they thought ate fast food and therefore would be overweight. A MODEL OF THE PIECEMEAL NATURE OF EVERYDAY UNDERSTANDING OF STATISTICAL INFERENCE Our results indicate that the context of a problem has a strong influence on children’s understanding of sampling. Although children designed random selection methods in the context of sampling gender, they designed samples that tended to cause (bias) the distribution of outcomes in the context of sampling opinions about a fun booth. Whether or not subsequent studies generalize the results to other sampling contexts, the ability to understand opinion polls is an important mathematical literacy skill. Our results indicate that children do not naturally see the statistical continuity between sampling in non-covariant situations (e.g., games of chance) and sampling in situations with potential covariances (e.g., surveys). Given that children proposed distinct selection methods, even within the same context, we may conclude that children do not have a singular heuristic or abstract schema that they use to understand all statistical situations. Instead, it appears that children’s intuitive statistical understanding is a collection of overlapping schemas that are differentially brought to bear depending on the particular problem context (cf. Konold, Pollatsek, Well, Lohmeier and Lipson, 1993; Mokros and Russell, 1995). In the case of a fun booth survey, the fact that individuals (or their attributes) cause opinions invites a covariance assumption. This elicits a particular subset of the schemas children might rely on to understand a statistical situation. For example, with the fun booth scenario, 80% of the children chose to survey friends or people who were likely to come. They thought of the survey as an invitation or advertisement, and wanted to sample those people most likely to attend. Even in the gender scenario, we see hints of something like a fairness schema in that 40% of the students thought of surveying an even split of boys and girls. They did not want to favour one gender or another, and evidently, they did not notice the undesirable side-effect that their sampling method predetermines the results. The structure of a statistical inference can be complicated. The middle panel of Figure 1, for example, shows three links and three different entities to keep track of. In other research we found that 6th-grade children understand a statistical situation by noting a contextually salient aspect of the overall inferential structure and applying a schema that makes sense for this aspect (Schwartz et ul., in press). Figure 2 portrays four characteristics of a survey and various schemas that bear a family resemblance to each characteristic. Some children, who focused on the size of the sample, wanted a large sample to ensure that it was inclusive enough to find all the associations between people’s traits and their opinions. Other children, who understood the role of sample size in terms of their knowledge of advertisements, wanted a large sample so they could reach plenty of people. Children who focused on the fact that a sample is a subset of a population drew upon a schema of party invitations. They evaluated a sample according to whether it was fair to the people Context EfSects in Statistical Reasoning S 109 Figure 2. Characteristics of a survey and possible interpretations. The rounded boxes represent properties of a survey. The clouds represent some everyday schemas that may be used to understand these properties. who did not get sampled. And, as in one case in the preceding study, children who focused on the idea of collecting the opinions of the population imported voting schemas so that a good sample included everybody in the population. The picture that develops from this research is that children’s early understanding of statistics is piecemeal and borrowed from more familiar concepts. DiSessa’s (1983, 1993) construct of ‘knowledge in pieces’ in the domain of physics provides an excellent example of the position we wish to emphasize. He argued that people‘s understanding of the physical world comes in discrete pieces of intuitive understanding whose elicitation is contingent upon the problem context. ‘Scientific explanation begins with common sense observation, a principal characteristic of which is its appearance as disparate and isolated special cases’ (p. 16, DiSessa, 1983). Although experts may have well-developed, coherent sets of principles, novices do not (Chi, Feltovich and Glaser, 1981; Larkin, McDermott, Simon and Simon, 1980). Under this model, conceptual growth does not begin with first principles, such as the laws of thermodynamics, that are subsequently mapped into specific cases. Rather, the growth of understanding is characterized as a process of sifting through and reconciling the cases, ‘finding successively the more and more general and fundamental ones which serve as principles, explaining the more special cases’ (p. 16, DiSessa, 1983). In the case of statistical reasoning, we suggest that learning can best be facilitated by helping students integrate pieces of contextualized knowledge, or diverse schemas, into fuller understanding. INSTRUCTIONAL IMPLICATIONS The educational challenge is to find a way to provide instruction that helps students align their schemas into more articulated understandings. If we think of an expert S110 D.L. Schwartz and S . R . Goldman theory of sampling, less articulated forms may be thought of as prototheories. A prototheory of fairness, for example, may be a precursor for a theory of stratification (e.g., chose 25 boys and 25 girls). An important instructional goal is to facilitate the movement of prototheories in the direction of normative theories of sampling. There is no reason to expect abrupt and rapid evolution of prototheories into normative ones, in part because the covariance assumption is a particularly useful assumption in many everyday contexts. Indeed, we have conducted some instructional studies in which we have been able to move most students in the direction of a normative theory. But we have by no means overcome their covariance assumption and the schemas that fall within this assumption. Children moved forward from invitation and advertisement notions of sampling, but they still relied on schemas common in their everyday worlds. For example, they moved to an idea that surveys need to be fair and give everyone a chance to participate. Although, compared to an invitation, this is a more general prototheory for understanding the role of sampling in statistical inference, it is still incomplete. For example, under the ‘fairness’ prototheory, students embrace survey methods that place response forms in a location available to anyone who wants to respond (Jacobs, 1996). The fairness prototheory does not permit an evaluation of the effects of self-selection on a statistical inference (e.g., maybe only the students who want to go to the fun booth would bother to answer the survey). Space does not permit further description of this work, but it is reported in Schwartz et al. (in press). Instead we summarize some of the essential features of instruction that are designed to facilitate the evolution of prototheories into normative theories of sampling. These features are derived from the empirical work reported here plus initial instructional work (Schwartz et al., in press). The general principles should apply to other domains in which everyday and formal systems of knowledge bump up against one another. Essential features of such instruction are that it must provide: (1) situations in which everyday interpretations or prototheories can be elicited; (2) situations where inconsistencies or discrepancies among students’ ‘pieces of knowledge’ become apparent; (3) opportunities for students to discover ways to reconcile those inconsistencies or discrepancies; (4) new situations in which to ‘test’ emerging prototheories and receive feedback; and (5) additional possibilities for revising. The process of exposing pieces of understanding for purposes of testing their alignment and consistency, and finding ways to bring them into alignment will necessarily be iterative (Barron, Vye, Zech, Schwartz, Bransford, Goldman, Pellegrino, Morris, Garrison and Kantnor, 1995). Not only do children enter situations with different understandings, but as we have just demonstrated, different contexts can be expected to bring forth different prototheories. Precisely because context-of-use has powerful effects on what is brought to mind, it is important to use complex contexts as instructional anchors (Cognition and Technology Group at Vanderbilt, 1992). They allow multiple prototheories to come to mind and be put in explicit juxtaposition. If students only work on problems that call forth a single prototheory, they can continue to maintain different prototheories in different contexts. They never need to discover organizing principles that would align the prototheories and improve progress towards normative theories. A good example of the negative effects of using narrow contexts comes from the ‘end of the chapter’ test phenomenon. In many college statistics courses, students study statistical ideas only in the context of the current text chapter. The students may be able to do all the problems at the end of Context Efects in Statistical Reasoning S1 11 each chapter in the text but are hopelessly confused on a final exam where the chapter cues are gone and they must figure out which ideas apply to which problems. We conclude by foregrounding one of the dilemmas of current theories of learning and the acquisition of knowledge: If it takes knowledge to make knowledge, how does anyone ever learn anything? Researchers usually invoke principles of similarity, reasoning by analogy, the importance of finding ways to access what the person knows and making it relevant, and so forth. The idea of learning new ideas by relying on old knowledge is surely important. However, eliciting prior knowledge is only part of the story. Much of our intuitive knowledge is in pieces rather than in wellorganized, formal systems. Consequently, using prior knowledge productively depends on tandem processes of detecting and reconciling conflicting interpretations that arise from different pieces of prior knowledge. Certainly, that is the case for statistical reasoning about sampling. Contextualized instruction, complemented with assessment activities that highlight and bridge specific relationships seem to be one way to accomplish the juxtaposition and reconciliation of conflicting interpretations into a more coherent and normative body of knowledge-a body of knowledge that will be in a form that can be applied to numerous contexts. REFERENCES Agnoli, F. and Krantz, D. H. (1989). Suppressing natural heuristics by formal instruction: the case of the conjunction fallacy. Cognitive Psychology, 21, 515-550. Bar-Hillel, M. (1980). What features make samples seem representative? Journal of Experimental Psychology: Human Perception and Performance, 6,578-589. Barron, B., Vye, N. J., Zech, L., Schwartz, D., Bransford, J. D., Goldman, S. R., Pellegrino, J. W., Morris, J., Garrison, S. and Kantnor, R. (1995). Creating contexts for community based problem solving: the Jasper challenge series. In C . Hedley, P. Antonacci and M. Rabinowitz (Eds.), Thinking and literacy: the mind at work (pp. 47-71). Hillsdale, NJ: Lawrence Erlbaum. Cheng, P. W., Holyoak, K. J., Nisbett, R. E. and Oliver, L. M. (1986). Pragmatic versus syntactic approaches to training deductive reasoning. Cognitive Psychology, 18, 293-328. Chi, M., Feltovich, P. and Glaser, R. (1981). Categorization and representations of physics problems by experts and novices. Cognitive Science, 5, 121-152. Cognition and Technology Group at Vanderbilt (1992). The Jasper series as an example of anchored instruction: theory, program description and assessment data. Educational Psychologist, 27, 291-315. Cole, M. and Scribner, S. (1974). Culture and thought: A psychological introduction. New York: Wiley. Cummins, D. D. (1995). Naive theories and causal deduction. Memory & Cognition, 23, 646-658. Dean, A. L. (1987). Rules versus cognitive structure as bases for children’s performance on probability problems. Journal of Applied Development Psychology, 8, 463479. DiSessa, A. A. (1983). Phenomenology and the evolution of intuition. In D. Gentner and A. L. Stevens (Eds.), Mental models (pp. 15-33). Hillsdale, NJ: Lawrence Erlbaum. DiSessa, A. A. (1993). Toward an epistemology of physics. Cognition & Instruction, 10, 105-225. Donaldson, M. (1978). Children’s mind. New York: W. W. Norton. Fischbein, E. (1975). The intuitive sources ofprobabilistic thinking in children. Boston: Reidel. Gal, I., Rothschild, K. and Wagner, D. A. (1990, April). Statistical concepts and statistical reasoning in school children: convergence or divergence? Paper presented at the annual meeting of the American Educational Research Association, Boston, MA. S112 D. L. Schwartz and S. R. Goldman Garfield, J. and Ahlgren, A. (1988). Difficulties in learning basic concepts in probability and statistics: implications for research. Journal of Research in Mathematics Education, 19, 44-63. Goldman, S. R. (1985). Inferential reasoning in and about narrative texts. In A. Graesser and J. Black (Eds.), The psychology of questions (pp. 247-276). Hillsdale, NJ: Lawrence Erlbaum. Jacobs, V. (1996). Children’s informal interpretation and evaluation of statistical sampling in surveys. Unpublished doctoral dissertation, University of Wisconsin, Madison. Johnson-Laird, P. N., Legrenzi, P. and Legrenzi, M. (1972). Reasoning and a sense of reality. British Journal of Psychology, 63, 395-400. Kahneman, D., Slovic, P. and Tversky, A. (Eds.) (1983). Judgment under uncertninty: heuristic and biases. NY: Cambridge University Press. Kelley, H. (1973). The processes of causal attribution. American Psychologist, 28, 107-1 28. Konold, C. (1989). Informal conceptions of probability. Cognition & Instruction, 6, 59-98. Konald, C., Pollatsek, A., Well, A., Lohmeier, J . and Lipson, A. (1993). Inconsistencies in students’ reasoning about probability. Journal for Research in Mathematics Education, 24, 392-414. Kuzmak, S. D. and Gelman, R. (1986). Young children’s understanding of random phenomena. Child Development, 57, 559-566. Larkin, J. H., McDermott, J., Simon, D. P. and Simon, H. A. (1980). Expert and novice performance in solving physics problems. Science, 208, 1335-1342. Mokros, J. and Russell, S. J. (1995). Children’s concepts of average and representativeness. Journal of Research in Mathematics Education, 26, 2&39. National Council of Teachers of Mathematics (1989). Curriculum and evaluation standards for school mathematics. Reston, VA: National Council of Teachers of Mathematics. Nisbett, R. E., Krantz, D., Jepson, C. and Kunda, Z. (1983). The use of statistical heuristics in everyday inductive reasoning. Psychological Review, 90, 339-363. Owens, D. (1992). Causes and coincidences. NY: Cambridge University Press. Piaget, J. and Inhelder, B. (1975). The origin of the idea of chance in children. (L. Leake JR., P. Burrell and H. D. Fischbein, Trans.). NY: Norton. Rubin, A,, Bruce, B. and Tenney, Y. (1990, April). Learning about sampling: trouble at the core of statistics. Paper presented at the annual meeting of the American Educational Research Association, Boston, MA. Schwartz, D. L., Goldman, S. R., Vye, N., Barron, B. and the Cognition and Technology Group at Vanderbilt (in press). Using anchored instruction to align everyday and mathematical reasoning: The case of sampling instructions. To appear in S. Lajoie (Ed.), Rejections on statistics: agendas for learning, teaching and assessment in K-12. Hillsdale, NJ: Lawrence Erlbaum. Shaughnessy, J. M. (1992). Research in probability and statistics: reflections and directions. In D. A. Grouws (Ed.), Handbook of research on mathematics teaching and learning (pp. 465-494). New York: Macmillan. Tversky, A. and Kahneman, D. (1983). Extensional vs. intuitive reasoning: the conjunction fallacy in probability judgment. Psychological Review, 90, 293-3 15. Yost, P. A., Siegel, A. E. and Andrews, J. M. (1962). Nonverbal probability judgments by young children. Child Development, 33, 769-780.