Why People are Not Like Marbles in an Urn

advertisement
APPLIED COGNITIVE PSYCHOLOGY, VOL. 10, S99-Sl12 (1996)
Why People are Not Like Marbles in an Urn:
An Effect of Context on Statistical Reasoning
DANIEL L. SCHWARTZ AND SUSAN R. GOLDMAN
The Learning Technology Center, Vanderbilt University
SUMMARY
A large body of research has examined the effect of contextual knowledge on deductive
reasoning. Relatively little work, however, has examined context effects on statistical
reasoning. In this paper, we document that in a context such as drawing marbles from an urn,
children correctly think of sampling as a way to measure the distribution of marbles. However,
in other contexts, such as taking a survey of people’s opinions, children design samples that
have the effect of causing a distribution. For example, they sample members of the population
most likely to have positive opinions. We interpret these results by proposing that knowledge
of statistics comes in discrete pieces of intuitive understanding whose elicitation is contingent
upon the problem context. We describe a model of instruction that acknowledges the effects of
context on statistical reasoning.
The ability to reason about statistical data has become an important component of
numerical literacy. Statistical information, once primarily in the purview of analysts,
appears regularly in the mass media in the form of polls and surveys. Reflecting the
increased importance of statistical reasoning, the National Council of Teachers of
Mathematics (1989) proposed that a corridor of statistical instruction begin as early
as upper elementary school. Statistics, however, presents a challenging domain in
which to develop mathematical reasoning. Ample literature documents people’s
misconceptions about statistical principles (e.g., Bar-Hillel, 1980; Kahneman, Slovic
and Tversky, 1982; Konold, 1989). Even when statistical principles are understood,
people often fail to apply them (Nisbett, Krantz, Jepson and Kunda, 1983). We
suggest that one source of people’s fallibility is that the context of a statistical
problem has a powerful and identifiable influence on the assumptions people make
about problem-relevant reasoning principles. We report an experiment that examines
this claim by testing adolescent’s understanding of sampling in two different
contexts.
Address correspondence to Daniel L. Schwartz, Learning Technology Center, Box 45, GPC, Vanderbilt
University, Nashville, T N 37203, USA.
We thank Nancy Vye, John Bransford, Joyce Moore and Taylor Martin for their conceptual, enabling,
and editorial contributionsto this work. This research was supported by a grant from the National Science
Foundation (NSF MDR-9252908).
CCC 08884080/96/S10099-14
0 1996 by John Wiley & Sons, Ltd.
Accepted 2 July 1996
SlOO
D. L. Schwartz and S. R. Goldman
CONTEXT EFFECTS IN STATISTICAL REASONING
The effect of context on people’s ability to reason deductively has been demonstrated
numerous times (e.g., Cheng, Holyoak, Nisbett and Oliver, 1986;Cole and Scribner, 1974;
Cummins, 1995; Donaldson, 1978; Johnson-Laird, Legrenzi and Legrenzi, 1972),but less
so for statistical reasoning (Garfield and Ahlgren, 1988; Shaughnessy, 1992). A brief
thought experiment, however, can demonstrate the power of context on statistical
reasoning. In a study by Tversky and Kahneman (1983), subjects received the profile of
3 1-year-old Linda who was described as single, outspoken, bright, philosophical, and
deeply concerned about issues of social equity. The subjects had to decide which of two
alternatives were more probable: (a) Linda is a bank teller; or (b) Linda is a bank teller and
is active in the feminist movement. Most subjects incorrectly chose option (b). To see why
this is the incorrect answer, consider another scenario that was not included in the Tversky
and Kahneman study. There is an urn filled with poker chps and marbles of various
colours. If one item is pulled from the urn, whch alternative is more probable: (a) the item
would be a marble; or (b) the item would be a blue marble. Unlike the Linda problem, we
suspect that most people would correctly choose option (a) in this scenario. They would
not make the mistake of viewing a conjoint event (i.e., blue and marble) as more likely than
one of the events alone (i.e., a marble). Yet, this is exactly the mistake that the subjects
made by choosing the ‘banker and feminist’ option in the context of the Linda study.
Why would these two contexts have different effects on people’s inclination to reason
statistically? There are two answers to this question. The first is that people do not
normally receive any instruction that helps them learn to apply statistical reasoning to
everyday contexts like the Linda scenario. Probability instruction usually relies on
explicit chance devices (e.g., the urn context). Students do not have an opportunity to
consider probability in other, less obviously chance-based situations. Similarly,
statistics instruction often emphasizes properties of numerical distributions (e.g.,
mean and mode), but takes for granted that students know how these distributions were
generated. Students have little occasion to think about how events, behaviours, or
opinions are randomly sampled from a context to create a distribution for statistical
analysis. Therefore, one reason that people may not apply statistical reasoning when
they should is that they have not learned how to turn an everyday situation into a
statistical one (cf. Agnoli and Krantz, 1989; Nisbett et al., 1983).
The second answer to this question, the one that suggests that instruction may be
necessary, is that people are predisposed to treat contexts involving people, such as the
Linda example, differentlythan contexts involving marbles in an urn. This predisposition
exists because of the intuitively-based understanding that certain properties of people
‘cause’ their actions, opinions and decisions. This leads to a tendency to think about
sampling people in causal terms rather than in chance or random terms. No such ‘causal’
understanding exists for marbles in an urn so people reason about sampling marbles in
chance terms. The following section explains this idea more fully.
CONFUSING THE ROLES OF CAUSE AND CHANCE
IN PRODUCING DISTRIBUTIONS
To examine the effects of context on statistical reasoning, we begin by considering
where contexts and statistical reasoning meet and why it might be difficult for people
Context Eflects in Statistical Reasoning
SlOl
to know how to turn an everyday situation into a statistical one. Measurement is the
bridge between the features of an everyday context and the numbers of a
mathematical analysis. For a statistical analysis involving people it is necessary to
convert people and/or their properties into a distribution of numerical frequencies.
This is accomplished by measuring the distribution of people and/or properties.
Sampiing is a method for measuring a distribution of people, events, properties,
opinions, and so forth. In a psychology experiment on confidence, for example,
people report a confidence rating, and the researcher measures the distribution of
people’s confidences through a sample. Notice that in this situation there are two
measurements, and this may create a space for confusion: people should think of a
confidence score as being caused by some property of the person or the experiment
(e.g., studying causes high confidence), but at the same time they should think of the
sample that is measured in the experiment (e.g., the distribution) as resulting from a
random selection process. To successfully turn a people situation into a statistical
one, people must navigate through the complexity that people generate or ‘cause’the
outcomes of interest, but that the way these outcomes are collected must not, in and
of itself, produce the distribution of outcomes. That is, the sample must be randomly
drawn to avoid a biased distribution. For a sample to yield valid inferences about the
distribution of the population as a whole, the researcher must be able to say that
chance governed the selection of the sample. This means that to reason statistically,
cause and chance must often be considered simultaneously.
Some researchers propose that people view chance as a factor that prevents perfect
prediction (e.g., Bar-Hillel, 1980; Konold, 1989; Kuzmak and Gelman, 1986).
Similarly, others propose that people learn of chance as the antithesis of causality
(e.g., Owens, 1992; Piaget and Inhelder, 1975). We additionally propose that people
may have trouble assigning complementary roles to the causal and chance elements
within a statistical inference, as would be necessary when using a random sample to
test for causal relations. In particular people may tend to view a sampling method in
an everyday context as a way to cause a distribution, rather than as a way to measure
the distribution of the total population. This tendency may be particularly strong in
opinion polling contexts because of intuitively-based beliefs that particular
properties of people ‘cause’ their opinions.
In everyday situations people have a tendency to over-attribute causality
(Cummins, 1995; Kelley, 1973). People often assume causal relations solely on the
basis of co-occurrence or temporal association (Goldman, 1985). It is as though
people bring an assumption that events should be explained causally, and they search
for covariances between events and/or properties that can support this form of
causal explanation. We will call this focus on causal association the covariance
assumption. In the following study, we develop evidence that the covariance
assumption leads children to think of sampling in the context of a survey in a fashion
that makes their sample methods cause a distribution, even though they can think of
sampling in random terms in contexts that do not lend themselves to causal
attribution. The idea here is not that children necessarily think of a survey as a way
to find covariances, although they often do. Rather, the availability of covariances in
a statistical situation leads children to design sampling methods that ultimately cause
the distribution of outcomes.
To further see how the covariance assumption may lead to confusions, we
decompose the structure of a statistical inference. Consider the three representations
S102
D. L. Schwartz and S. R. Goldman
of a statistical inference shown in Figure I . The top panel represents a typical chance
setup such as pulling marbles from an urn. The left circle represents the population
of marbles from which the sample is drawn. The heavy arrow in the left of the panel
represents the sample procedure; in this case, the procedure consists of reaching into
the urn and blindly pulling marbles. The rounded box represents the resulting
sample. The dashed arrow represents the inference from the sample back to an
Population
Sample
i
Inferred
Po pula t ion
->y
-> X
-> Z
xyxz
xyxz
x y x z....
SURVEY SETUP
W
(people and their opinions)
zzxyzxy
xyxz
xyxz
x y x z....
SURVEY in
CHANCE SETUP
Figure 1. Why a survey is psychologically unlike drawing marbles from an urn. In a survey,
there is a potential covariant relationship between people’s characterstics and their opinions
(x, y and z). In a chance setup, such as pulling coloured marbles from an urn, one would not
normally say that a red marble covaries with a red marble.
Context Effects in Statistical Reasoning
S103
estimate of the population. The inferred population distribution is represented in the
circle on the right.
Next consider a statistical inference in the context of a survey, The middle panel
captures a common sense interpretation of a survey. In the left circle, there is a
population of people. As before, the heavy arrow indicates the selection of the
sample. In this case, however, one samples individuals from a population who then
generate a sample of opinions. This is represented in the centre box with the small
arrows indicating a relationship between the chosen individuals and their opinions,
x, y and z. The sample of opinions may then be used to infer the distribution of
opinions within the original population. When viewing a survey this way, one takes
two measurements: a sample of population characteristics and a measurement of
how individuals in the sample respond to the survey questions. Panel three shows a
less intuitive way to view a survey. Here, similar to the chance setup, one samples
from a population of opinions to make an inference about a population of opinions.
In the chance setup, the sampled population, the sample, and the inferred
population involve the same types of entities: red marble-+red marble-red marble.
There is no explanation for the fact that a red marble ‘leads to’ a red marble. In
contrast, in the survey setup, there are two distinct entities, people and opinions. The
categorical difference between people and opinions may affect one’s interpretation of
sampling. One can easily wonder whether something about a person ‘leads to’
opinion x. More generally, one might view the people in panel 2 as reflecting
identifiable traits within the population such as age, gender or political affiliation.
Consequently, one might ask whether a particular trait (e.g., gender) is associated
with opinion x. Or, one might ask whether all the classes of people who have
different opinions are represented in the sample (e.g., the figure with the baseball cap
is left out). We hypothesize that the covariance assumption leads people to interpret
a survey in terms of panel 2. This draws people to focus on the association between
particular traits within a population and particular opinions. This is a natural
tendency; people try to associate traits with outcomes. However, for people
unschooled in sampling, this focus on population traits may cause problems when
they think about taking a sample. Their sampling procedure, for example, may select
people on the basis of particular traits, and without satisfactory knowledge of
stratification techniques, this can bias or cause the distribution of outcomes (or
opinions).
A TEST OF THE EFFECTS OF CONTEXT ON
THE COVARIANCE ASSUMPTION
In the following study we examined whether context influences children’s statistical
reasoning. This study differs from prior work on children’s statistical reasoning.
That work primarily used materials in which the sampling procedure has a limited
causal interpretation (e.g., Dean, 1987; Yost, Siege1 and Andrews, 1962). Activities
like rolling a die, twirling a spinner, or drawing a marble from an urn offer limited
opportunites for a causal explanation of the distribution of outcomes. It would be
highly unlikely, for example, for someone to reason that a red marble causes its
redness. These chance setups have provided important information about children’s
probabilistic reasoning (e.g., Fischbein, 1975; Gal, Rothschild and Wagner, 1990;
S104
D . L. Schwartz and S. R . Goldman
Rubin, Bruce and Tenney, 1990). This information, however, does not explain how
children’s understanding of probabilistic inference evolves to handle more everyday
statistical situations. An understanding of the relationship between sample and
population in chance setups may not naturally lead to an understanding of more
everyday situations like taking a survey. Unlike the case of drawing marbles from an
urn, for example, the opinions individuals hold are often attributed to them because
of a personal characteristic such as age. If our hypothesis of a covariance asumption
is right, the context of an opinion survey should lead children, and perhaps adults as
well, to take these characteristics into account when developing sampling methods,
thereby causing a distribution of outcomes.
To examine whether adolescents bring a covariance assumption to their statistical
reasoning, we asked them to design samples for two scenarios. The fun booth
scenario examined sampling ideas in the context of potential covariant relations. The
children designed sampling methods to estimate the participation rate for a fun
booth at a school fair. The gender scenario examined the children’s ideas in the
context of non-covariant relations. In this scenario the children designed sampling
methods to estimate how many boys and girls were at their school.
If there is a covariance assumption, the fun booth scenario should yield sample
selection methods based on population attributes that are relevant to participation in
the fun booth. They might, for example, sample students on a baseball team if the
fun booth involves a baseball toss. In terms of Figure 1, they might want to sample
the figure with the baseball hat. For the gender scenario, however, we expected
random selection methods because the sampled entity and the property of interest
are the same (e.g., male+male). Accordingly, we coded the children’s sampling
methods as to whether they were based on attributes of the population that were
inference relevant, or whether they were random methods that selected individuals
blind to their characteristics.
Methods
Subjects
Fifteen children were randomly selected from two 6th-grade classrooms to
participate in the study. The children had previously studied probability in the
context of dice.
Design
The design was within-subject; each child proposed sampling methods for both
scenarios. Eight children completed the gender scenario first, and seven children
completed the fun booth scenario first. There was no effect of problem order so it
will not be considered further. The dependent measures were the number of types of
inference-relevant and random selection methods they generated for each scenario.
Table 1 shows the coding scheme more fully.
Procedure
The children met with an experimenter individually. The session was videotaped for
purposes of later review. For the fun booth scenario, the children were told to
imagine that they were going to have a booth of their choosing at a school fair. They
were asked what booth they would have. (Answers ranged from ball toss to balloon
Put up a sign, and whoever wants to come can fill it out
Ask all the baseball players
I'd ask my friends, 'cause they would come
I'd check 25 boys and 25 girls
Ask 50 teachers to guess how many boys and girls
I'd ask everyone
Inference-relevant
Self-selection
Likely to come
Friends
Fair split
Other
Repub 1ic
Democracy
+ =fun booth method; o =gender method; x =method for both.
Draw 50 names from a hat
Give it to two teachers to hand to their classes
You just have to pick 'em without looking
Just stand in the hall and count the first SO kids
Example response
Random
Explicit device
Class teacher
Just pick 'em
First to come by
Sampling method
0
0
0
+ +
0
0
+ +
0
0
X
o
o
o
X
+
+
+
+
o
+
+ +
x
0
+
x
0
x
+ +
x
0
+
0
+
0
+
0
+
0
0
Students sorted by prediction fit
5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 S
0
0
4
+
3
0
2
0
1
Table 1. Classes of sampling methods proposed by each student for each scenario
s.'
S106
D. L. Schwartz and S. R. Goldman
popping to fishing-for-prizes booths.) They were told that everyone who went to
their booth would get a prize. Therefore, they needed an estimate of how many
children would come to their booth so they could estimate how many prizes to prepurchase. They were told that they were not allowed to ask everybody at the school.
Instead, they could only survey 50 of the total 400 students who would go to the fair.
To ensure the students understood the purpose of a sample, the experimenter
explained how the results of the sample could be extrapolated to estimate how many
students from their school would come to the booth. For the gender scenario, the
children were told that their task was to estimate how many boys and girls were at
their school. As with the booth scenario, the experimenter explained that they would
have to take a survey of 50 students and then extrapolate to the population of 400
students at their school. After hearing a scenario, the children were prompted to
generate as many sampling methods as they could think of. The experimenter moved
on to the second scenario (or finished the experiment) when the children offered no
new sampling methods after two prompts to continue.
Results and discussion
The leading hypothesis of this study was that the presence of potential covariant
relations in the fun booth scenario (e.g., baseball players would like a ball toss
booth) would lead children to generate more inference-relevant selection methods
compared to the gender scenario. The results supported this hypothesis. A primary
coder separated the children's sampling methods into inference-relevant,random and
other sampling methods. The unifying characteristic of the inference-relevant
selection methods was that individuals would be selected on the basis of a
hypothesis about how they would respond to the survey. The unifying characteristic
of the random selection methods was that individuals would not be chosen on the
basis of inference-relevant attributes. To provide a finer-grained analysis, the coder
also developed the sub-categorizations described in Table 1. The table includes two
other, non-sampling methods that were not focal for the current hypothesis, but will
be discussed below. After developing the coding scheme, the primary coder
categorized each child's sampling methods for each scenario. A second coder, who
was informed about the categorization scheme but who was blind to the hypothesis,
had 100% agreement on the primary classification of inference-relevant, random or
other sampling method and had 94% agreement on the sub-categorizations.
Disagreements were resolved through negotiation.
The number of types of inference-relevant selection methods and the number of
types of random selection methods served as dependent measures in a multivariate
analysis of variance with type of scenario serving as the independent variable.' The
scenario by sampling method interaction was significant, F(1,14) = 8.26,M S e = 0.52,
p < 0.01. Students provided more random selection methods for the gender scenario
( M = 1.2, SD=O.7) than the fun booth scenario (M=0.5, SD=O.7) but more
inference-relevant methods for the fun booth scenario ( M = 1.O, S D = 0.5) than the
gender scenario ( M =0.6, S D = 0.7). Thirteen children mentioned at least one
random and one inference-relevant selection method in their responses to the two
'Number of types rather than total number was used because we were interested in different ways students
thought about sampling in the two problem contexts. Being able to provide several examples of the same
way of thinking about sampling is not informative to the hypotheses of the study.
Context Eflects in Statistical Reasoning
S107
scenarios. If the students understood the importance of drawing a random sample in
all contexts, then they should have offered at least one random method for each
context, regardless of all other types of responses. Only 40% of the children offered
random selection methods for both scenarios. Demonstrating the powerful effect of
context, the children tended to use random methods for the gender scenario
exclusively and inference-relevant methods for the fun booth scenario exclusively.
Table 1 shows that 47% of the students used inference-relevant methods exclusively
for the fun booth context, x2( 1) = 4.5, p <0.05, whereas 53% of the students used
random methods exclusively for the gender context, x2( 1) = 8.0, p <0.01.
Although many children knew enough about sampling to use a random selection
method for the gender scenario, they did not think to apply this technique to the fun
booth scenario. Our interpretation of this finding is that the potential covariant
relationships in the fun booth scenario (e.g., a baseball player is more likely to come
to a ball toss booth) led to selection methods based on population attributes relevant
to whether children would come to the booth. In contrast, the gender scenario does
not have potential covariant relations. As a consequence, it elicited random selection
methods that were blind to the characteristics of the individuals chosen to be in the
sample.
In addition to the differences in sampling methods across the scenarios, there was
a high level of within-subject variability within a scenario. For the gender scenario,
for example, 26% of the children suggested both a random and a fair split selection
method (i.e., ask 25 boys and 25 girls). At one moment, many of the children seemed
to grasp the assumptions and purposes of sampling, and at the next, they seemed to
lose them, even in the non-covariant, gender scenario. In the context of the fun
booth survey, children easily lost sight of a sample as a way to find information
about the population at large. The variability within individuals indicates that the
children did not have stable heuristics or principles underlying their selection of
sampling methods. Below, we develop a model of the children’s understanding that
may explain the apparent inconsistencies in their reasoning.
One alternative interpretation of the current results is that the children did not
have any prior experience for understanding the inference from a sample to a
population in surveys. They may have ‘misheard’ the fun booth scenario and
thought that their task was to make sure that the booth had a high level of
participation. As we develop below, this interpretation is in part correct in that
children of this age use more familiar schemas to make sense of a survey. They, for
example, will often construe a survey as a way to advertise. However, we do not
think the current results are simply due to the fact that these children may have had
limited exposure to surveys. We have found that even when children have had
exposure to surveys and have solved several problems that required extrapolating
from a survey to a population, they still tend to bring a covariance assumption to
their understanding of sampling methods (Schwartz, Goldman, Vye, Barron and
Cognition and Technology Group at Vanderbilt, in press). Moreover, we conducted
a small follow-up study with 24 college sophomores who presumably would have
had more exposure to surveys. The students designed a survey to investigate the
hypothesis that fast food is correlated with being overweight. Out of the 24 students,
19 designed surveys that ignored baseline by proposing that the surveys be given out
exclusively at a fast-food restaurant (i.e., they neglected the distribution of weights
among people who do not eat fast food). This tentatively suggests a covariance
S108
D. L. Schwartz and S. R. Goldman
assumption in that the sophomores chose to sample the population on the basis of an
inference-relevant attribute, namely, the individuals’ presence at a fast-food
restaurant. Much like the 6th-graders who chose to sample people who they
thought would come to the fun booth, the college students chose to sample people
who they thought ate fast food and therefore would be overweight.
A MODEL OF THE PIECEMEAL NATURE OF EVERYDAY
UNDERSTANDING OF STATISTICAL INFERENCE
Our results indicate that the context of a problem has a strong influence on children’s
understanding of sampling. Although children designed random selection methods
in the context of sampling gender, they designed samples that tended to cause (bias)
the distribution of outcomes in the context of sampling opinions about a fun booth.
Whether or not subsequent studies generalize the results to other sampling contexts,
the ability to understand opinion polls is an important mathematical literacy skill.
Our results indicate that children do not naturally see the statistical continuity
between sampling in non-covariant situations (e.g., games of chance) and sampling
in situations with potential covariances (e.g., surveys).
Given that children proposed distinct selection methods, even within the same
context, we may conclude that children do not have a singular heuristic or abstract
schema that they use to understand all statistical situations. Instead, it appears that
children’s intuitive statistical understanding is a collection of overlapping schemas
that are differentially brought to bear depending on the particular problem context
(cf. Konold, Pollatsek, Well, Lohmeier and Lipson, 1993; Mokros and Russell,
1995). In the case of a fun booth survey, the fact that individuals (or their attributes)
cause opinions invites a covariance assumption. This elicits a particular subset of the
schemas children might rely on to understand a statistical situation. For example,
with the fun booth scenario, 80% of the children chose to survey friends or people
who were likely to come. They thought of the survey as an invitation or
advertisement, and wanted to sample those people most likely to attend. Even in
the gender scenario, we see hints of something like a fairness schema in that 40% of
the students thought of surveying an even split of boys and girls. They did not want
to favour one gender or another, and evidently, they did not notice the undesirable
side-effect that their sampling method predetermines the results.
The structure of a statistical inference can be complicated. The middle panel of
Figure 1, for example, shows three links and three different entities to keep track of.
In other research we found that 6th-grade children understand a statistical situation
by noting a contextually salient aspect of the overall inferential structure and
applying a schema that makes sense for this aspect (Schwartz et ul., in press). Figure
2 portrays four characteristics of a survey and various schemas that bear a family
resemblance to each characteristic. Some children, who focused on the size of the
sample, wanted a large sample to ensure that it was inclusive enough to find all the
associations between people’s traits and their opinions. Other children, who
understood the role of sample size in terms of their knowledge of advertisements,
wanted a large sample so they could reach plenty of people. Children who focused on
the fact that a sample is a subset of a population drew upon a schema of party
invitations. They evaluated a sample according to whether it was fair to the people
Context EfSects in Statistical Reasoning
S 109
Figure 2. Characteristics of a survey and possible interpretations. The rounded boxes
represent properties of a survey. The clouds represent some everyday schemas that may be
used to understand these properties.
who did not get sampled. And, as in one case in the preceding study, children who
focused on the idea of collecting the opinions of the population imported voting
schemas so that a good sample included everybody in the population.
The picture that develops from this research is that children’s early understanding
of statistics is piecemeal and borrowed from more familiar concepts. DiSessa’s (1983,
1993) construct of ‘knowledge in pieces’ in the domain of physics provides an
excellent example of the position we wish to emphasize. He argued that people‘s
understanding of the physical world comes in discrete pieces of intuitive
understanding whose elicitation is contingent upon the problem context. ‘Scientific
explanation begins with common sense observation, a principal characteristic of
which is its appearance as disparate and isolated special cases’ (p. 16, DiSessa, 1983).
Although experts may have well-developed, coherent sets of principles, novices do
not (Chi, Feltovich and Glaser, 1981; Larkin, McDermott, Simon and Simon, 1980).
Under this model, conceptual growth does not begin with first principles, such as the
laws of thermodynamics, that are subsequently mapped into specific cases. Rather,
the growth of understanding is characterized as a process of sifting through and
reconciling the cases, ‘finding successively the more and more general and
fundamental ones which serve as principles, explaining the more special cases’
(p. 16, DiSessa, 1983). In the case of statistical reasoning, we suggest that learning
can best be facilitated by helping students integrate pieces of contextualized
knowledge, or diverse schemas, into fuller understanding.
INSTRUCTIONAL IMPLICATIONS
The educational challenge is to find a way to provide instruction that helps students
align their schemas into more articulated understandings. If we think of an expert
S110
D.L. Schwartz and S . R . Goldman
theory of sampling, less articulated forms may be thought of as prototheories. A
prototheory of fairness, for example, may be a precursor for a theory of stratification
(e.g., chose 25 boys and 25 girls). An important instructional goal is to facilitate the
movement of prototheories in the direction of normative theories of sampling. There
is no reason to expect abrupt and rapid evolution of prototheories into normative
ones, in part because the covariance assumption is a particularly useful assumption
in many everyday contexts. Indeed, we have conducted some instructional studies in
which we have been able to move most students in the direction of a normative
theory. But we have by no means overcome their covariance assumption and the
schemas that fall within this assumption. Children moved forward from invitation
and advertisement notions of sampling, but they still relied on schemas common in
their everyday worlds. For example, they moved to an idea that surveys need to be
fair and give everyone a chance to participate. Although, compared to an invitation,
this is a more general prototheory for understanding the role of sampling in
statistical inference, it is still incomplete. For example, under the ‘fairness’
prototheory, students embrace survey methods that place response forms in a
location available to anyone who wants to respond (Jacobs, 1996). The fairness
prototheory does not permit an evaluation of the effects of self-selection on a
statistical inference (e.g., maybe only the students who want to go to the fun booth
would bother to answer the survey). Space does not permit further description of this
work, but it is reported in Schwartz et al. (in press). Instead we summarize some of
the essential features of instruction that are designed to facilitate the evolution of
prototheories into normative theories of sampling. These features are derived from
the empirical work reported here plus initial instructional work (Schwartz et al., in
press). The general principles should apply to other domains in which everyday and
formal systems of knowledge bump up against one another.
Essential features of such instruction are that it must provide: (1) situations in
which everyday interpretations or prototheories can be elicited; (2) situations where
inconsistencies or discrepancies among students’ ‘pieces of knowledge’ become
apparent; (3) opportunities for students to discover ways to reconcile those
inconsistencies or discrepancies; (4) new situations in which to ‘test’ emerging
prototheories and receive feedback; and (5) additional possibilities for revising. The
process of exposing pieces of understanding for purposes of testing their alignment
and consistency, and finding ways to bring them into alignment will necessarily be
iterative (Barron, Vye, Zech, Schwartz, Bransford, Goldman, Pellegrino, Morris,
Garrison and Kantnor, 1995). Not only do children enter situations with different
understandings, but as we have just demonstrated, different contexts can be expected
to bring forth different prototheories. Precisely because context-of-use has powerful
effects on what is brought to mind, it is important to use complex contexts as
instructional anchors (Cognition and Technology Group at Vanderbilt, 1992). They
allow multiple prototheories to come to mind and be put in explicit juxtaposition. If
students only work on problems that call forth a single prototheory, they can
continue to maintain different prototheories in different contexts. They never need to
discover organizing principles that would align the prototheories and improve
progress towards normative theories. A good example of the negative effects of using
narrow contexts comes from the ‘end of the chapter’ test phenomenon. In many
college statistics courses, students study statistical ideas only in the context of the
current text chapter. The students may be able to do all the problems at the end of
Context Efects in Statistical Reasoning
S1 11
each chapter in the text but are hopelessly confused on a final exam where the
chapter cues are gone and they must figure out which ideas apply to which problems.
We conclude by foregrounding one of the dilemmas of current theories of learning
and the acquisition of knowledge: If it takes knowledge to make knowledge, how
does anyone ever learn anything? Researchers usually invoke principles of similarity,
reasoning by analogy, the importance of finding ways to access what the person
knows and making it relevant, and so forth. The idea of learning new ideas by relying
on old knowledge is surely important. However, eliciting prior knowledge is only
part of the story. Much of our intuitive knowledge is in pieces rather than in wellorganized, formal systems. Consequently, using prior knowledge productively
depends on tandem processes of detecting and reconciling conflicting interpretations
that arise from different pieces of prior knowledge. Certainly, that is the case for
statistical reasoning about sampling. Contextualized instruction, complemented with
assessment activities that highlight and bridge specific relationships seem to be one
way to accomplish the juxtaposition and reconciliation of conflicting interpretations
into a more coherent and normative body of knowledge-a body of knowledge that
will be in a form that can be applied to numerous contexts.
REFERENCES
Agnoli, F. and Krantz, D. H. (1989). Suppressing natural heuristics by formal instruction: the
case of the conjunction fallacy. Cognitive Psychology, 21, 515-550.
Bar-Hillel, M. (1980). What features make samples seem representative? Journal of
Experimental Psychology: Human Perception and Performance, 6,578-589.
Barron, B., Vye, N. J., Zech, L., Schwartz, D., Bransford, J. D., Goldman, S. R., Pellegrino,
J. W., Morris, J., Garrison, S. and Kantnor, R. (1995). Creating contexts for community
based problem solving: the Jasper challenge series. In C . Hedley, P. Antonacci and M.
Rabinowitz (Eds.), Thinking and literacy: the mind at work (pp. 47-71). Hillsdale, NJ:
Lawrence Erlbaum.
Cheng, P. W., Holyoak, K. J., Nisbett, R. E. and Oliver, L. M. (1986). Pragmatic versus
syntactic approaches to training deductive reasoning. Cognitive Psychology, 18, 293-328.
Chi, M., Feltovich, P. and Glaser, R. (1981). Categorization and representations of physics
problems by experts and novices. Cognitive Science, 5, 121-152.
Cognition and Technology Group at Vanderbilt (1992). The Jasper series as an example of
anchored instruction: theory, program description and assessment data. Educational
Psychologist, 27, 291-315.
Cole, M. and Scribner, S. (1974). Culture and thought: A psychological introduction. New York:
Wiley.
Cummins, D. D. (1995). Naive theories and causal deduction. Memory & Cognition, 23,
646-658.
Dean, A. L. (1987). Rules versus cognitive structure as bases for children’s performance on
probability problems. Journal of Applied Development Psychology, 8, 463479.
DiSessa, A. A. (1983). Phenomenology and the evolution of intuition. In D. Gentner and A. L.
Stevens (Eds.), Mental models (pp. 15-33). Hillsdale, NJ: Lawrence Erlbaum.
DiSessa, A. A. (1993). Toward an epistemology of physics. Cognition & Instruction, 10,
105-225.
Donaldson, M. (1978). Children’s mind. New York: W. W. Norton.
Fischbein, E. (1975). The intuitive sources ofprobabilistic thinking in children. Boston: Reidel.
Gal, I., Rothschild, K. and Wagner, D. A. (1990, April). Statistical concepts and statistical
reasoning in school children: convergence or divergence? Paper presented at the annual
meeting of the American Educational Research Association, Boston, MA.
S112
D. L. Schwartz and S. R. Goldman
Garfield, J. and Ahlgren, A. (1988). Difficulties in learning basic concepts in probability
and statistics: implications for research. Journal of Research in Mathematics Education,
19, 44-63.
Goldman, S. R. (1985). Inferential reasoning in and about narrative texts. In A. Graesser and
J. Black (Eds.), The psychology of questions (pp. 247-276). Hillsdale, NJ: Lawrence
Erlbaum.
Jacobs, V. (1996). Children’s informal interpretation and evaluation of statistical sampling in
surveys. Unpublished doctoral dissertation, University of Wisconsin, Madison.
Johnson-Laird, P. N., Legrenzi, P. and Legrenzi, M. (1972). Reasoning and a sense of reality.
British Journal of Psychology, 63, 395-400.
Kahneman, D., Slovic, P. and Tversky, A. (Eds.) (1983). Judgment under uncertninty: heuristic
and biases. NY: Cambridge University Press.
Kelley, H. (1973). The processes of causal attribution. American Psychologist, 28, 107-1 28.
Konold, C. (1989). Informal conceptions of probability. Cognition & Instruction, 6, 59-98.
Konald, C., Pollatsek, A., Well, A., Lohmeier, J . and Lipson, A. (1993). Inconsistencies in
students’ reasoning about probability. Journal for Research in Mathematics Education, 24,
392-414.
Kuzmak, S. D. and Gelman, R. (1986). Young children’s understanding of random
phenomena. Child Development, 57, 559-566.
Larkin, J. H., McDermott, J., Simon, D. P. and Simon, H. A. (1980). Expert and novice
performance in solving physics problems. Science, 208, 1335-1342.
Mokros, J. and Russell, S. J. (1995). Children’s concepts of average and representativeness.
Journal of Research in Mathematics Education, 26, 2&39.
National Council of Teachers of Mathematics (1989). Curriculum and evaluation standards for
school mathematics. Reston, VA: National Council of Teachers of Mathematics.
Nisbett, R. E., Krantz, D., Jepson, C. and Kunda, Z. (1983). The use of statistical heuristics in
everyday inductive reasoning. Psychological Review, 90, 339-363.
Owens, D. (1992). Causes and coincidences. NY: Cambridge University Press.
Piaget, J. and Inhelder, B. (1975). The origin of the idea of chance in children. (L. Leake JR., P.
Burrell and H. D. Fischbein, Trans.). NY: Norton.
Rubin, A,, Bruce, B. and Tenney, Y. (1990, April). Learning about sampling: trouble at the core
of statistics. Paper presented at the annual meeting of the American Educational Research
Association, Boston, MA.
Schwartz, D. L., Goldman, S. R., Vye, N., Barron, B. and the Cognition and Technology
Group at Vanderbilt (in press). Using anchored instruction to align everyday and
mathematical reasoning: The case of sampling instructions. To appear in S. Lajoie (Ed.),
Rejections on statistics: agendas for learning, teaching and assessment in K-12. Hillsdale, NJ:
Lawrence Erlbaum.
Shaughnessy, J. M. (1992). Research in probability and statistics: reflections and directions.
In D. A. Grouws (Ed.), Handbook of research on mathematics teaching and learning
(pp. 465-494). New York: Macmillan.
Tversky, A. and Kahneman, D. (1983). Extensional vs. intuitive reasoning: the conjunction
fallacy in probability judgment. Psychological Review, 90, 293-3 15.
Yost, P. A., Siegel, A. E. and Andrews, J. M. (1962). Nonverbal probability judgments by
young children. Child Development, 33, 769-780.
Download