Causal-Based Categorization: A Review Bob

advertisement
Running head: Causal-Based Categorization: A Review
Causal-Based Categorization: A Review
Bob Rehder
Department of Psychology
New York University
Send all correspondence to:
Bob Rehder
Department of Psychology
6 Washington Place
New York, NY 10003
Phone: (212) 992-9586
Email: bob.rehder@nyu.edu
Causal-Based Classification: A Review
2
Abstract
This chapter reviews the last decade’s work on causal-based classification, the effect of
interfeature causal relations on how objects are categorized. Evidence for and against the numerous
effects discussed in the literature is evaluated: the causal status effect, the relational centrality effect, the
multiple cause effect, and the coherence effect. Evidence for explicit causal reasoning in classification and
the work conducted on children’s causal-based classification is also presented. The chapter evaluates the
implications these findings have for two models of causal-based classification—the dependency model
(Sloman, Love, & Ahn, 1998) and the generative model (Rehder, 2003b; Rehder & Kim, 2006)—and
discusses methodological issues such as the testing of natural versus novel (artificial) categories and the
interpretation of classification tests. Directions for future research are identified.
Causal-Based Classification: A Review
3
1. Introduction
Since the beginning of investigations into the mind/brain, philosophers and psychologists have
asked how people learn categories and classify objects. Interest in this question should be unsurprising
given that categories are a central means by which old experiences guide our responses to new ones.
Regardless of whether it is a new event or temporally extended object, a political development or a new
social group, a new biological species or type of widget on a computer screen, every stimulus we
experience is novel in at least some regards, and so the types into which they are grouped become the
repositories of new knowledge. Thus I learn that credit default swaps are risky, that elections have
consequences, and that the funny symbol on my cell phone means I have voice mail.
That the stimuli we classify span such a wide range means that the act of categorization is
surprisingly complex and varied. Some categories seem to have a (relatively) simple structure. As a child
I learn to identify some parts of my toys as wheels and some letters in words as “t.” Accordingly, much
of the field has devoted itself to the categorization of stimuli with a small number of perceptual
dimensions, testing in the lab subjects’ ability to learn to classify stimuli such as Gabor patches or
rectangles that vary in height and width (see Ashby & Maddox, 2005, for a review). This research has
shown that the learning of even these supposedly simple categories can be quite involved, as subtle
differences in learning procedures and materials result in large differences in what sorts of representations
are formed, how attention is allocated to different stimulus dimension, how feedback is processed, and
which brain regions underlie learning.
Other categories, in contrast, have an internal structure that is much more integrated with other
sorts of knowledge. For example, as compared to wheels or “t”s, a notion such as elections is related to
many other concepts: that people organize themselves into large groups known as countries, that
countries are led by governments, that in democratic countries governments are chosen by people voting,
and so on. Ever since this point was made in Murphy and Medin (1985) seminal article, a substantial
literature has emerged documenting how the knowledge structures in which categories are embedded
have a large effect on how categories are learned, how objects are classified, how new properties are
generalized to a category, and how missing features in an object are predicted on the basis of its category
Causal-Based Classification: A Review
4
membership (see Murphy, 2002, for a review). Because of its ubiquity in our conceptual structures (Ahn,
Marsh, Luhmann, & Lee, 202), one particular type of knowledge—the causal relations that obtain
between features of categories—has received special attention. For example, even if you know little about
cars, you probably have at least some vague notion that cars not only have gasoline, spark plugs,
radiators, fans, and emit carbon monoxide but also that these features causally interact—that the spark
plugs are somehow involved in the burning of the gasoline, that that burning produces carbon monoxide
and heat, and that the radiator and fan somehow work to dissipate the later. Here I focus on the rich
database of empirical results showing how this sort of knowledge affects the key category-based
judgment, namely, classification itself.
This chapter has the following structure. The first section addresses important methodological
issues regarding the measurement of various effects of causal knowledge on categorization. Section 3
presents models that have been proposed to account for the key empirical phenomena, and those
phenomena are then described in Sections 4, 5, 6, and 7. I close with a discussion of development issues
(Section 8), and directions for future research (Section 9).
2. Assessing Causal-Based Classification Effects
The central question addressed by this literature is: What evidence does an object’s features
provide for membership in a category as a function of the category’s network of interfeature causal
relations? This section discusses two issues regarding the measurement of such effects, namely, the
testing of natural versus novel categories and the type (and interpretation) of the classification tests
administered.
2.1. Assessing Causal-Based Effects in Natural versus Novel Categories
Studies have assessed causal-based effects on classification for both natural (real-world)
categories and novel ones (made-up categories that are taught to subjects as part of the experimental
session). When natural categories are tested, researchers first assess the theories that subjects hold for
these categories and then test how those theories affect classification. For example, one common method
is the theory drawing task in which subjects are presented with a category’s features and asked to draw
the causal relations between those features and to estimate the strengths of those relations. Using this
Causal-Based Classification: A Review
5
method, Sloman et al. (1998) measured theories for common-day objects (e.g., apples and guitars); Kim
and Ahn (2002a; b) and Ahn Levin, and Marsh (2005) did so for psychiatric disorders such as depression
and schizophrenia.
In contrast, for novel categories subjects are explicitly instructed on interfeature causal links. For
example, in a seminal study by Ahn, Kim, Lassaline, and Dennis (2000a), participants were instructed on
a novel type of bird named roobans with three features: eats fruit (X), has sticky feet (Y), and builds nests
on trees (Z). In addition, participants were told that features were related in a causal chain (Figure 1) in
which X causes Y ("Eating fruit tends to cause roobans to have sticky feet because sugar in fruits is
secreted through pores under their feet.") and Y causes Z ("Sticky feet tends to allow roobans to build
nests on trees because they can climb up the trees easily with sticky feet."). Similarly, Rehder and Hastie
(2001) instructed subjects on a novel type of star named myastars and how some features of myastars
(e.g., high density) caused others (e.g., a large number of planets). Usually studies also provide some
detail regarding causal mechanism regarding why one feature produces another.
There are a number of advantages to testing novel rather than natural categories. One is that novel
categories provide greater control over the causal relations that are used. For example, when classifying
into natural categories, it is possible that limited cognitive resources (e.g., working memory) prevent
subjects from using the dozens of causal relations they usually identify in a theory-drawing task. In
contrast, experiments using novel categories usually teach subjects 2–4 causal links and the experimental
context itself makes it clear that those causal links are the relevant ones (especially so, when the causal
links are presented on the computer screen as part of the classification test). Of course, this does not rule
out the use of additional causal links that subjects might assume are associated with particular ontological
kinds by default (see Section 4.3 for one possibility in this regard).
Another advantage of novel categories is that they can control for the numerous other factors
besides causal knowledge that are known to influence category membership. For example, features that
are more salient will have greater influence than less salient ones (e.g., Lamberts, 1995; 1998). And,
feature importance is influenced by what I call empirical-statistical information, that is, how often
features or exemplars are observed as occurring as category members and nonmembers (Rosch & Mervis,
Causal-Based Classification: A Review
6
1975). Patterns of features that are observed to occur within category members (e.g., a feature’s category
validity, the probability that it occurs given the category) may be especially problematic, because this
information is likely to covary with their causal role. For example, a feature with many causes is likely to
appear in more category members than one with few causes; two features that are causally related are also
likely to be correlated in observed category members. Thus, any purported effect of causal knowledge on
classification in natural categories might be due to the statistical patterns of features that causal links
generate (and classifiers then observe) rather than the links themselves.1 In contrast, when novel
categories are used, counterbalancing the assignment of features to causal role or use of multiple novel
categories averages over effects of feature salience and contrast categories. And, that subjects have not
seen examples of these made-up categories eliminates effects of empirical-statistical information.
For these reasons, this chapter focuses on studies testing novel experimental categories. Of
course, this is not to say that studies testing natural categories have not furthered our understanding of
causal-based classification in critical ways, as such studies have reported numerous interesting and
important findings (e.g., Ahn, 1998; Ahn, Flanagan, Marsh, & Sanislow, 2006; Kim & Ahn 2002a; b;
Sloman et al, 1998). As always, research is advanced most rapidly by an interplay between studies testing
natural materials (that afford ecological validity) and novel ones (that afford experimental control).
However, when rigorous tests of computational models is the goal (as it is here), the more tightly
controlled studies are to be emphasized.
2.2. Interpreting Classification Tests
After subjects learn a novel category, they are presented with a series of objects and asked to
render a category membership judgment. The question of how causal knowledge affects classification can
be divided into two subquestions. The first is how causal knowledge affects the influence of individual
features on classification. The second concerns how certain combinations of features make for better
category members. In categorization research there is precedent for considering these two different types
of effects. For example, Medin and Schaffer (1978) distinguished independent cue models, in which each
feature provides an independent source of evidence for category membership (prototype models are an
example of independent cue models), from interactive cue models, in which a feature’s influence depends
Causal-Based Classification: A Review
7
on what other features are present (exemplar models are an example of interactive cue models). Of
course, whereas most existing categorization models are concerned with how features directly observed in
category members influence (independently or interactively) subsequent classification decisions, the
current chapter is concerned with how classification is affected by interfeature causal relations.
I will refer to one method for assessing the importance of individual features as the missing
feature method. As mentioned, in the study by Ahn et al. (2000a), participants were instructed on novel
categories (e.g., roobans) with features related in a causal chain (X!Y!Z). Participants were then
presented with three items missing exactly one feature (one missing only X, one missing only Y, one
missing only Z) and asked to rate how likely that item was a category member. Differences among the
ratings of these items were interpreted as indicating how the relative importance of features varies as a
function of their causal role. For example, that the missing-X item was rated a worse category member
than the missing-Y item was taken to mean that X was more important than Y for establishing category
membership. (The result that features are more important than those they cause is referred to as the causal
status effect and will be discussed in detail in Section 4). The missing feature method has been used in
numerous other studies (Ahn, 1998; Kim & Ahn, 2002a; b; Kim, Luhmann, Pierce, and Ryan, 2009;
Luhmann, Ahn, & Palmeri, 2006; Sloman, Love, & Ahn, 1998).
A different method for assessing feature weights was used by Rehder and Hastie (2001). They
also instructed subjects on novel categories (e.g., myastars) with four features that causally related in
various topologies. However, rather than just presenting test items missing one feature, Rehder and Hastie
presented all 16 items that can be formed on four binary dimensions. Linear regression analyses were then
performed on those ratings in which there was one predictor for each feature coding whether a feature
was present or absent in a test item. The regression weight on each predictor was interpreted as the
importance of that feature. Importantly, the regression equation also included two-way and higher-order
interaction terms to allow an assessment of how important certain combinations of features are for
forming good category members. For example, a predictor representing the two-way interaction between
features X and Y encodes whether features X and Y are both present or both absent versus one present
and the other absent, and the resulting regression weight on that predictor represents the importance to
Causal-Based Classification: A Review
8
participants' categorization rating of dimensions X and Y having the same value (present or absent) or not
in that test item. In fact, Rehder and Hastie found that subjects exhibited sensitivity to two-way and
higher-order feature interactions, producing, for example, higher ratings when a cause and effect feature
were both present or both absent and lower ratings when one was present and the other absent. (This
phenomenon, known as the coherence effect, will be discussed in detail in Section 6). The regression
method has been used in numerous studies (Rehder, 2003a; 2003b; 2007; Rehder & Kim, 2006; 2009b).
Which of these methods for assessing causal-based classification effects should be preferred? One
obvious advantage of the regression method is that it, unlike the missing feature method, provides a
measure of feature interactions, an important consideration given the presence of large coherence effects
described later. In addition however, there are also several reasons to prefer the regression method for
assessing feature weights, which I now present in ascending order of importance.
The first advantage is that regression is a generalization of a statistical analysis method that is
already very familiar to empirical researchers, namely, analysis of variance (Judd & McClelland, 1989).
For example, imagine an experiment in which subject are taught a category with three features X, Y, and
Z and then rate the category membership of the eight distinct test items that can be formed on three binary
dimensions. This experiment can be construed as a 2 x 2 x 2 within-subjects design in which the three
factors are whether the feature is present or absent on dimension X, on dimension Y, and on dimension Z.
The question of whether the regression weight on, say, feature X is significantly different than zero is
identical to asking whether there is a “main effect” of dimension X. The question of whether the two-way
interaction weight between X and Y is different than zero is identical to asking whether there is an
interaction between dimensions X and Y. That is, one can ask whether the “main effect” of feature X is
“moderated” by the presence or absence of feature Y (as one might expect if X and Y are causally
related).
The second reason to prefer regression is that it provides a more powerful method of statistical
analysis. The regression weight on, say, dimension X amounts to a difference score between the ratings of
the test items that have feature X and those that don’t. 2 Supposing again that the category has three
features, the weight on X is the difference between the ratings on test items 111, 110, 101, and 100 versus
Causal-Based Classification: A Review
9
011, 010, 001, and 000 (where “1” means that a feature is present and “0” that it is absent, e.g., 101 means
that X and Z are present and Y absent). As a consequence, use of regression means that an assessment of
X’s weight involves all eight test items, whereas the missing feature method bases it solely on the rating
of one item (namely, 011). By averaging over the noise associated with multiple measures, the regression
method produces a more statistically powerful assessment of the importance of features.
Third, and most importantly, the missing feature method produces, as compared to regression, a
potentially different and (as I shall show) incorrect assessment of a feature’s importance. This is the case
because any single test item manifests both the independent and interactive effects of its features; in
particular the rating of a test item missing a single feature is not a pure measure of that’s feature’s weight
(i.e., it does not correspond to the “main effect” associated with the feature). For example, suppose that a
category has three features in which X causes both Y and Z (that is, X, Y, and Z form a common cause
network) and that subjects are asked to rate test items on a 1-10 scale. Assume that subjects produce a
baseline classification rating of 5, that ratings are 1 point higher for each feature present in a test item and
1 point lower for each feature that is absent (and unchanged if the presence or absence of the feature is
unknown). That is, features X, Y, and Z are all weighed equally. In addition, assume there exists
interactive effects such that the rating goes 1 point higher whenever a cause and effect feature are both
present or both absent and 1 point lower whenever one of those is present and the other absent. The
classification ratings for this hypothetical experiment are presented in Example 1 in Table 1 for the eight
test items that can be formed on the three dimensions and the three test items that have one feature present
and two unknown (again, “1” = feature present and “0” = absent; “x” means the state of the feature is
unknown). For instance, test item 110 (X and Y present, Z absent) has a rating of 6 because, as compared
to an average rating of 5, it gains two points due to the presence X and Y, loses one because of the
absence Z, gains one because the causally related X and Y are both present, and loses one because X is
present but its effect Z is absent.
It is informative to compare the different conclusions reached by the missing feature method and
linear regression regarding feature weights in this example. Importantly, the rating of 4 received by the
item missing only feature X (011) is lower than the rating of 6 given to the items missing only Y (101) or
Causal-Based Classification: A Review
10
Z (110). This result seems to imply (according to the missing feature method), that X is more important
than Y and Z. However, this conclusion is at odds with the conditions that were stipulated in the example,
namely, that all three features were weighed equally. In fact, item 011 is rated lower not because feature
X is more important, but rather because it includes two violations of causal relations (X is absent even
though both of its effects are present) whereas 101 and 110 have only one (in each, the cause X is present
and one effect is absent). This example demonstrates how the missing feature method can mischaracterize
an interactive effect of features as a main effect of a single feature.
In contrast, a regression analysis applied to the eight test items in Example 1 correctly recovers
the fact that all features are weighed equally and, moreover, the interactive effect between the causally
related features X and Y, and X and Z. Specifically, for Example 1 the regression equation would be,
ratingi = "0 + "XfX + "YfY + "ZfZ+ "XYfXY+ "XZfXZ+ "YZf YZ+ "XYZfXYZ
where ratingi is the rating for test item i, fj = +1 when feature j is present in test item i and –1 when it is
absent, fjk = fj fk, and fXYZ= fXfYfZ. This regression analysis yields "0 = 5, "X = "Y = "Z = 1, "XY = "XZ = 1,
and "YZ = "XYZ = 0. These beta weights are of course just those that were stipulated in the example.
It is important to recognize that the alternative conclusions reached by the two methods are not
merely two different but equally plausible senses of what we mean by “feature weights” in classification.
The critical test of whether the weight assigned to a feature is correct is whether it generalizes sensibly to
other potential test items—the value of knowing the importance of individual features is that it allows one
to estimate the category membership of any potential item, not just those presented on a classification test.
For example, if for Example 1 you were to conclude (on the basis of the missing feature method) that X is
the most important feature, you would predict that the item with only feature X (100) should be rated
higher than those with only Y (010) or only Z (001). And, for items that have only one known feature,
you would predict that 1xx should be rated higher than x1x or xx1. Table 1 reveals that these predictions
would be incorrect, however: Item 100 is rated lower than 010 and 001 (because 100 violates two causal
links whereas the others violate only one) and 1xx, x1x, and xx1 all have the same rating (6). This
example illustrates how the missing feature method can yield a measure of feature importance that fails to
generalize to other items.
Causal-Based Classification: A Review
11
These conclusions do not depend on all features being equally weighed as they are in Example 1.
Example 2 in Table 1 differs from Example 1 in that the weights on features Y and Z have been reduced
to .5 (so that X now is the most heavily weighed feature). The missing feature method again assigns
(correctly, in this case) a greater weight to feature X (because 011’s rating of 3 is lower than the 6 given
to 101 or 110). But whereas it then predicts that item 100 should be rated higher than 010 or 001, in fact
that item is rated lower (3 vs. 4) because the two violations of causal relations in item 100 outweigh the
presence of the more heavily weighed X. Nor are these sorts of effects limited to a common cause
network. In Examples 3 and 4, X, Y, and Z are arranged in a causal chain. Despite that features are
weighed either equally (Example 3) or X > Y = Z (Example 4), in both examples the item missing feature
Y (101) is rated lower than 011 or 110, and thus the missing feature method would incorrectly identify Y
as the most important feature. Together, Examples 1–4 demonstrate how the missing feature method can
systematically mischaracterize the true effect of individual features on classification. In contrast, a
regression analysis recovers the correct feature weights and the interactive effects in all four examples.
The three advantages associated with the regression method mean that it is a superior method for
assessing the effect of causal knowledge on classification. Are there any potential drawbacks to use of
this method? Three issues are worth mentioning. First, to allow an assessment of both feature weights and
interactions, the regression method requires that subjects rate a larger number of test items and thus it is
important to consider what negative impact this might have. A longer test increases the probability that
fatigue might set it in and, in designs in which subjects must remember the causal links on which they are
instructed, that subjects might start to forget those links. To test whether this in fact occurs, I reanalyzed
data from a study reported by Rehder and Kim (2006) in which subjects were taught categories with five
features and up to four causal links and then asked to rate a large number of test items, namely, 32. Even
though subjects had to remember the causal links (because they were not displayed during the
classification test), they exhibited significant effects of causal knowledge on both feature weights and
feature interactions, indicating that they made use of that knowledge. Moreover, the reanalysis revealed
that the magnitude of these effects for the last 16 test items was the same as the first 16. Thus, although
fatigue and memory loss associated with a larger number of test items are valid concerns, at present there
Causal-Based Classification: A Review
12
is no empirical evidence that this occurs in the sort of studies reviewed here. Of course, fatigue and
memory loss may become problems if an even larger number of causal links and test items than in Rehder
and Kim (2006) are used.
Another potential consequence of the number of items presented on classification test is that,
because it is well known that an item’s rating will change depending on the presence of other items
(Poulton, 1989), the missing feature or regression methods may yield different ratings for exactly the
same item. For instance, in Example 1 the ratings difference between the missing-X (011) and the
missing-Y and –Z items (101 and 110) is likely to be larger when those are the only items being rated (the
missing feature method) as compared to when both very likely (111) and unlikely (100) category
members are included (the regression method). This is so because the absence of these latter items will
result in the response scale expanding as subjects attempt to make full use of the scale; their presence will
result in the scale contracting because 111 and 100 “anchor” the ends of the scale. (An example of this
sort of scale contraction and expansion is presented in Section 4.5.1.) For this reason, it is ill advised to
compare test item ratings across conditions and studies that differ in the number of different types of test
items presented.
A third potential issue is whether the classification rating scale used in these experiments can be
interpreted as an interval scale. As mentioned, the weights produced by the regression method are a series
of difference scores. Because in general those differences involve different parts of the response scale,
comparing different weights requires the assumption that the scale is being used uniformly. But of course
issues such as the contraction that occurs at the ends of scales are well known. 3 Transformations (e.g.,
arc-sine) are available of course; in addition, more recent studies in my lab have begun to use forcedchoice judgments (and logistic regression) to avoid these issues (see Sections 7 and 8 for examples).
In summary, regression is superior to the missing feature method because it (a) assesses feature
interactions, (b) is closely relation to ANOVA, (c) yields greater statistical power, and (d) yields feature
weights that generalize correctly to other items. At the same time, care concerning the number of test
items and issues of scale usage must be exercised; other sorts of test (e.g., forced-choice) might be
appropriate in some circumstances. Of course, although assessing “effects” properly is important, the
Causal-Based Classification: A Review
13
central goal of research is not to merely catalog effects but also to propose theoretical explanations of
those effects. Still, like other fields, this one rests on empirical studies that describe how experimental
manipulations influence the presence and size of various effects, and false claims and controversies can
arise in the absence of a sound method for assessing those effects. These concerns are not merely
hypothetical. Section 8 describes experimental results demonstrating that previous conclusions reached on
the basis of a study using the missing feature method were likely due to an interactive effect of features
rather than a main effect of feature weights.
2.3. Terminology
A final note on terminology is in order. Whereas Sloman et al. (1998) have used the terms
mutability or conceptual centrality, these refer to a property of individual features. However, I have noted
how causal knowledge may also affect how combinations of features can influence classification
decisions. Accordingly, I will simply use the term classification weight to refer to the weight that features
and certain combinations of features have for membership in a particular category.
3. Computational Models
I now present two computational models that have been offered as accounts of the effects of
causal knowledge on categorization. Both models specify a rule that assigns to an object a measure of its
membership in a category on the basis of that category’s network of interfeature causal relations. Nothing
is assumed about the nature of those causal links other than their strength. Neither model denies the
existence of other effects on classification, such as the presence of contrast categories, the salience of
particular features, or the empirical/statistical information that people observe firsthand. Rather, the claim
is that causal relations will have the predicted effects when these factors are controlled.
3.1. The Dependency Model
One model is Sloman et al.'s (1998) dependency model. The dependency model is based on the
intuition that features are more important to category membership (i.e., are more conceptually central) to
the extent they have more dependents, that is, features that depend on them (directly or indirectly). A
causal relation is an example of a dependency relation in which the effect depends on its cause. For
example, DNA is more important than the color of an animal's fur because so much depends on DNA;
Causal-Based Classification: A Review
14
hormones are more important than the size of its eyes for the same reason.
According to the dependency model, feature i's weight or centrality, ci,, can be computed from the
iterative equation,
ci,t+1 = #dijcj,t
(1)
where ci,t is i's weight at iteration t and dij is the strength of the causal link between i and its dependent j.
For example, if a category has three category features X, Y, and Z, and X causes Y which causes Z (as in
Figure 1), then when cZ,1 is initialized to 1 and each causal link has a strength of 2, after two iterations the
centralities for X, Y, and Z are 4, 2, and 1. That is, feature X is more important to category membership
than Y which in turn is more important than Z. Stated qualitatively, the dependency model predicts a
difference in feature weights because X, Y, and Z vary in the number of dependents they have: X has two
(Y and Z), Y has one (Z), and Z has none. Table 2 also presents how the feature weights predicted by the
dependency model vary as a function of causal strength parameter (the ds). These predictions have been
tested in experiments described in Section 4.
Although the dependency model was successfully applied to natural categories in Sloman et al.,
its original formulation makes it technically inapplicable to many causal networks of theoretical interest.
However, Kim et al. (2009) have recently proposed new variants of the dependency model that address
these issues, allowing it to be applied to any network topology. Still, these variants inherent the same
qualitative properties as their predecessor, namely, features grow in importance as a function of their
number of dependents and the strengths of the causal links with those dependents.4
Note that while the dependency model and its variants specify how feature weights vary as a
function of causal network, it makes no predictions regarding how feature combinations makes for better
or worse category members (i.e., it predicts the absence of interactive effects). This is one important
property distinguishing it from the next model.
3.2. The Generative Model
The second model is the generative model (Rehder, 2003a; b; Rehder & Kim, 2006). Building on
causal-model theory (Waldmann & Holyoak, 1992; Sloman, 2005), the generative model assumes that
interfeature causal relations are represented as probabilistic causal mechanisms and that classifiers
Causal-Based Classification: A Review
15
consider whether an object is likely to have been produced or generated by those mechanisms. Objects
that are likely to have been generated by a category's causal model are considered to be good category
members and those unlikely to be generated are poor category members.
Quantitative predictions for the generative model can be generated assuming a particular
representation of causal relations first introduced by Cheng (1997) and later applied to a variety of
category-based tasks (Rehder & Hastie, 2001; Rehder, 2003a; b; Rehder, 2009a; Rehder & Kim, 2006;
2009a; 2009b; Rehder & Burnett, 2005). Assume that category k’s causal mechanism relating feature j
and its parent i operates (i.e., produces j) with probability mij when i is present and that any other potential
background causes of j collectively operate with probability bj. Given other reasonable assumptions (e.g.,
the independence of causal mechanisms, see Cheng & Novick, 2005), then j’s parents and the background
causes form a "fuzzy-or" network that together produce j in members of category k conditional on the
state of j’s parents with probability,
pk ( j | parents( j )) = 1" (1" b j )$
(1" m )
( )
i# parents j
ind ( i)
ij
(2)
where ind(i) is an indicator variable that evaluates to 1 when i is present and 0 otherwise. The probability
of a root
! cause r is a free parameter cr.
For example, for the simple chain network in Figure 1 in which nodes have at most one parent,
the probability of j when its parent i is present is
pk ( j | i) = 1" (1" b j )(1" mij ) = mij + b j " mij b j
(3)
That is, the probability of j is the probability that it is brought about by its parent or by its background
causes.
! When i is absent, the causal mechanism mij has no effect on j and thus the probability of j is
simply
( )
pk j | i = b j
(4)
By applying Equation 2 iteratively, one can derive the equations representing the likelihood of
any possible
combination of the presence or absence of features in any causal network. For example,
!
Table 3 presents the likelihood equations for the chain network in Figure 1 for any combination of
features X, Y, and Z. Table 3 also presents the probability of each item for a number of different
Causal-Based Classification: A Review
16
parameter values. The strengths of the causal links between X and Y (mXY) and between Y and Z (mYZ)
are varied over the values 0, .33, .75, .90, and 1.0 while bY and bZ are held fixed at .10. In addition, bY and
bZ are varied over the values 0, .25, and .50 while mXY and mYZ are held fixed at .75. Parameter cX (the
probability that the root cause feature X appears in members of category k) is fixed at .75, consistent with
the assumption that X is a typical feature of category k.
Table 3 indicates how causal relations determine a category’s statistical distribution of features
among category members. The assumption of course is that these item probabilities are related to their
category membership: Items that with a high probability of being generated are likely category members
and those with a low probability of generation are unlikely ones. That is, according to the generative
model, the effects of causal relations on classification are mediated by the statistical distribution of
features that those relations are expected to produce.
Although the generative model’s main predictions concern whole items, from these probability
distributions one can derive statistics corresponding to the two sorts of empirical effects I have described,
namely, feature weights and feature interactions. For example, from Equations 3 and 4 the probability of j
appearing in a k category member is
( ) ()
p k ( j ) = p k ( j | i ) p k (i ) + p k j | i p k i
()
pk ( j ) = ( mij + b j " mij b j ) pk (i) + b j pk i
!
pk ( j ) = mij pk (i ) + b j " mij b j pk (i)
(5)
where!i is the parent of j. Table 3 presents the probability of each feature for each set of parameter values.
For example,
when cX = .75, mXY = mYZ = .90, and bY = bZ = .10, then pk(X) = .750, pk(Y) = .708, and
!
pk(Z) = .673. That is, feature X has a larger “weight” than Y which in turn is larger than Z’s. Table 3
presents how the feature weights predicted by the generative model vary as a function of parameters. In
Section 4, I will show how the generative and dependency models make qualitatively differently
predictions regarding how feature weights vary across parameter settings and present results of
experiments testing those predictions.
Importantly, the generative model also predicts that causally related features should be correlated
within category members. A quantity that reflects a dependency—and hence a correlation—between two
Causal-Based Classification: A Review
17
variables is the probabilistic contrast. The probabilistic contrast between a cause i and an effect j is
defined as the difference between the probability of j given the presence or absence of i:
( )
"pk (i, j ) = pk ( j | i) # pk j | i
(6)
For the causal network in Figure 1, Table 3 shows how the contrasts between the directly causally
related
!features, $pk (X, Y) and $pk (X, Z), are greater than 0, indicating how those pairs of features
should be correlated (a relation that holds so long as the ms > 0 and the bs < 1). Moreover, the contrast
between the two indirectly related features, $pk (X, Z), is greater than zero but less than the direct
contrasts, indicating that X and Z should also be correlated, albeit more weakly. In other words, the
generative model predicts interactive effects: Objects will be considered good category members to the
extent they maintain expected correlations and worse ones to the extent they break those correlations.
That the generative model predicts interactive effects between features is a key property distinguishing it
from the dependency model. Table 3 presents how the pairwise feature contrasts predicted by the
generative model vary as a function of parameters, predictions that have been tested in experiments
described in Section 6.
The generative model also makes predictions regarding the patterns of higher-order interactions
one expects for a causal network. For example, a higher order contrast that defines how the contrast
between i and j is itself moderated by h is given by Equation 7.
[
(
)] [ (
)
(
"pk ( h,i, j ) = pk ( j | ih ) # pk j | ih # pk j | ih # pk j | ih
)]
(7)
Table 3 indicates that for a chain network $pk (X, Y, Z) = 0, indicating that the contrast between Y and Z
is itself
! unaffected by the state of X. This corresponds to the well-known causal Markov condition in
which Y “screens off” Z from X; likewise $pk (Z, X, Y) = 0 means that Y screens off X from Z (Pearl,
1988). In Section 6 I will demonstrate how these sorts of higher order contrasts manifest themselves in
classification judgments.
4. The Causal Status Effect
This section begins the review of the major phenomena that have been discovered in the causalbased categorization literature. In each of the following four sections I define the phenomenon, discuss
the major theoretical variables that have been shown to influence that phenomenon, and consider the
Causal-Based Classification: A Review
18
implications these results have for the two computational models just described. I also briefly discuss
other variables (e.g., experimental details of secondary theoretical importance) that have also been shown
to have an influence.
As mentioned, this review focuses on studies testing novel categories, that is, ones with which
subjects have no prior experience because they are learned as part of the experiment. There are two
reasons for this. The first is that there are already good reviews of work testing the effects of causal
knowledge on real-world categories (e.g., Ahn & Kim, 2001). The second is the presence of confounds
associated with natural materials (e.g., the presence of contrast categories, the different salience of
features, the effects of empirical-statistical information, etc.) already noted in Section 2.
The first empirical phenomenon I discuss is the causal status effect, an effect on feature weights
in which features that appear earlier in a category's causal network (and thus are "more causal") carry
greater weight in categorization decisions. For example, in Figure 1, X is the most causal feature, Z is the
least causal, and Y is intermediate. As a consequence, all else being equal, X should be weighed more
heavily than Y which should be weighed more heavily than Z. In fact, numerous studies have
demonstrated situations in which features are more important than those they cause (Ahn, 1998; Ahn et al,
2000a; Kim et al. 2009; Luhmann et al. 2006; Rehder 2003b; Rehder & Kim, 2006; Sloman et al. 1998).
Nevertheless, that the size of the causal status effect can vary dramatically across studies—in
many it is absent entirely—raises questions about the conditions under which it appears. For example, in
the Ahn et al. (2000a) study participants learned novel categories with three features X!Y!Z and then
rated the category membership of items missing exactly one feature on a 0 to 100 scale. The item missing
X was rated lower (27) than one missing Y (40) which in turn was lower than the one missing Z (62),
suggesting that X is more important than Y which is more important than Z. A large causal status effect
was also found in Ahn (1998, Experiments 3 and 4) and Sloman et al. (1998, Study 3).5
In contrast, in Rehder and Kim (2006) participants learned categories in which 3 out of 5 features
were connected in a causal chain and then rated the category membership of a number of test items. To
assess the importance of features, regression analyses were performed on those ratings. Unlike Ahn et al.,
Rehder and Kim found only a modest (albeit significant) difference in the regression weights of X and Y
Causal-Based Classification: A Review
19
(7.6 and 6.4), a difference that reflected the nearly equal ratings on the missing-X and missing-Y items
(43 and 47, respectively). In contrast, the regression weight on Z (6.2) and the rating of the missing-Z
item (48) indicated no difference in importance between features Y and Z. Similarly, testing categories
with four features, Rehder (2003b) found a partial causal status effect (a larger weight on the chain's first
feature and smaller but equal weights on the remaining ones) in one experiment and no causal status
effect at all in another.
What factors are responsible for these disparate results? Based on the contrasting predictions of
the dependency and generative models presented earlier, I now review recent experiments testing a
number of variables potentially responsible for the causal status effect.
4.1. Causal Link Strength
One factor that may influence the causal status effect is the strength of the causal links. For
example, whereas Ahn (1998) and Ahn et al. (2000a) described the causal relationships in probabilistic
terms by use of the phrase "tends to" (e.g., "Sticky feet tends to allow roobans to build nests on trees."),
Rehder and Kim (2006) omitted any information about the strength of the causal links. This omission may
have invited participants to interpret the causal links as deterministic (i.e., the cause always produces the
effect), and this difference in the perceived strength of the causal links may be responsible for the
different results.6
To test this hypothesis, Rehder and Kim (2009b, Experiment 1) directly manipulated the strength
of the causal links. All participants were taught three category features and two causal relationships
linking X, Y, and Z into a causal chain (as in Figure 1). For example, participants who learned myastars
were told that the typical features of myastars were a hot temperature, high density, and a large number of
planets and that hot temperature causes high density which in turn causes a large number of planets. Each
feature included information about the other value on the same stimulus dimension (e.g., “Most myastars
have high density whereas some have low density.”). Each causal link was accompanied with information
about the causal mechanism (e.g., “High density causes the star to have a large number of planets.
Helium, which cannot be compressed into a small area, is spun off the star, and serves as the raw material
for many planets.”). In addition, participants were given explicit information about the strengths of those
Causal-Based Classification: A Review
20
causal links. For example, participants in the Chain-100 condition were told that each causal link had a
strength of 100%: "Whenever a myastar has high density, it will cause that star to have a large number of
planets with probability 100%." Participants in the Chain-75 condition were told that the causal links
operated with probability 75% instead. Participants then rated the category membership of all eight items
that could be formed on the three binary dimensions. A Control condition in which no causal links were
presented was also tested.
The dependency and generative models make distinct predictions for this experiment. Table 2
shows that the dependency model predicts that the size of the causal status effect is an increasing
monotonic function of causal strength. For example, after two iterations feature weights are 4, 2, and 1
when cZ,1 = 1 and dXY = dYZ = 2 (yielding a difference of 3 between the weights of X and Z) versus 9, 3,
and 1 when dXY = dYZ = 3 (a difference of 8). Intuitively, it makes this prediction because stronger causal
relations mean that Y is more dependent on X and Z is more dependent on Y. As a consequence, the
dependency model predicts a stronger causal status effect in the Chain-100 condition versus the Chain-75
condition.
In contrast, Table 3 shows that the generative model predicts that the size of causal status effect
should decrease as the strength of the causal links increases; indeed, the causal status effect can even
reverse at high levels of causal strength. For example, when bX = bZ = .10, Table 3 shows that the
difference between pk(X) and pk(Z) (a measure of the causal status effect) is .553, .241, .077 and –.048 for
causal strengths of .33, .75, .90 and 1.0, respectively. Intuitively, a causal status effect is more likely for
probabilistic links because X generates Y, and Y generates Z, with decreasing probability. For example, if
cX = .75, mXY = mYX = .75, and there are no background causes (bs = 0), then pk(X) = .750, pk(Y) = .7502
= .563, and pk(Z) = .7503 = .422. Thus, so long as the b parameters (which work to increase the
probability of Y and Z) are modest, the result will be that pk(X) will be larger than pk(Z). In contrast, a
causal status effect is absent for deterministic links because X always generates Y, and Y always
generates Z. For example, if cX = .75, ms = 1, and bs = 0, pk(X) = pk(Y) = pk(Z) = .750, and the causal
status effect grows increasingly negative (i.e., pk(Z) becomes greater than pk(X)) as the bs increase. Note
that because one also expects features to be weighed equally in the absence of any causal links between
Causal-Based Classification: A Review
21
features, the generative model predicts that the causal status effect should vary nonmonotonically with
causal strength: It should be zero when mXY = mYZ = 0, large when the ms are intermediate, and zero (or
even negative) when the ms = 1. Thus the generative model predicts a stronger causal status effect in the
Chain-75 condition versus the Chain-100 condition and the Control conditions.
Following Rehder and Kim (2006), regression analyses were performed on each subjects’
classification ratings with predictors for each feature and each two-way interaction between features. The
regression weights averaged over subjects for features X, Y, and Z are presented in the left panel of
Figure 2A. (The right panel, which presents the two-way interactions weights, will be discussed in
Section 6.) In fact, a larger causal status effect obtained in the Chain-75 condition in which the causal
links were probabilistic as compared to the Chain-100 condition in which they were deterministic; indeed
in the Chain-100 condition the causal status effect was absent entirely (the small quadratic effect of
features suggested by Figure 2A was not significant). Of course, a stronger causal status effect with
weaker causal links is consistent with the predictions of the generative model and inconsistent with those
of the dependency model. As expected, in the Control condition (not shown in Figure 2A), all feature
weights were equal.
As a further test of the generative model, after the classification test we also asked subjects to
estimate how frequently each feature appeared in category members. For example, subjects who learned
myastars were asked how many myastars out of 100 many would have high density. Recall that the
generative model predicts that effects of causal knowledge changes people’s beliefs regarding how often
features appear in category members, and, if this is correct, the effects uncovered in the classification test
should be reflected in the feature likelihood ratings. The results of the feature likelihood ratings, presented
in Figure 3A, support this conjecture: Likelihood ratings decreased significantly from feature X to Y to Z
in the Chain-75 condition whereas those ratings were flat in the Chain-100 condition, mirroring the
classification results in Figure 2A. This finding supports the generative model’s claim that causal
relations change classifiers’ subjective beliefs about the category’s statistical distribution of features (also
see Sloman et al., 1998). Clearly, causal link strength is one key variable influencing the causal status
effect. Additional evidence for this conclusion is presented in Section 4.5.
Causal-Based Classification: A Review
22
4.2. Background Causes
Experiment 2 from Rehder and Kim (2009b) conducted another test of the generative and
dependency models by manipulating the strength of alternative causes of the category features, that is, the
generative model’s b parameters. All participants were instructed on categories with three features related
in a causal chain in which each causal link had a strength of 75%. However, in the Background-0
condition, they were also told that there were no other causes of the features. For example, participants
who learned about myastars learned not only that high density causes a large number of planets with
probability 75%, but also that “There are no other causes of a large number of planets. Because of this,
when its known cause (high density) is absent, a large number of planets occurs in 0% of all myastars.” In
contrast, in the Background-50 condition these participants were told that “There are also one or more
other features of myastars that cause a large number of planets. Because of this, even when its known
cause (high density) is absent, a large number of planets occurs in 50% of all myastars.”
Table 3 shows how the generative model's predictions vary with the b parameters and indicates
that the causal status effect should become weaker as features’ potential background causes get stronger;
indeed, it should reverse as b grows much larger than .50. Specifically, when the ms = .75 the difference
between pk(X) and pk(Z) is .328, .122, and –.043 for values of the bY and bZ of 0, .25, and .50, respectively
(Table 3). Intuitively, this occurs because as bY and bZ increase they make the features Y and Z more
likely; indeed, as the bs approach 1, Y and Z will be present in all category members. As a consequence,
the generative model predicts a larger causal status effect in the Background-0 condition as compared to
the Background-50 condition.
The dependency model, in contrast, makes a different prediction for this experiment. Because it
specifies that a feature's centrality is a sole function of its dependents, supplying a feature with additional
causes (in the form of background causes) should have no effect on its centrality. Thus, because centrality
should be unaffected by the background cause manipulation, the dependency model predicts an identical
causal status effect in the Background-0 and Background-50 conditions.7
Regression weights derived from subjects’ classification ratings are shown in Figure 2B. The
results were clear-cut: A larger causal status effect obtained in the Background-0 condition in which
Causal-Based Classification: A Review
23
background causes were absent as compared to the Background-50 condition in which they weren’t;
indeed in the Background-50 condition the causal status effect was absent entirely. Moreover, these
regression weights were mirrored in subjects’ explicit feature likelihood ratings (Figure 3B). These results
confirm the predictions of the generative model and disconfirm those of the dependency model. The
strength of background causes is a second key variable affecting the causal status effect.
4.3. Unobserved “Essential” Features
The preceding two experiments tested the effect of varying the m and b parameters on the causal
status effect. However, there are reasons to expect that categorizers sometimes reason with a causal model
that is more elaborate than one that includes only observable features. For example, numerous researchers
have suggested that people view many kinds as being defined by underlying properties or characteristics
(an essence) that is shared by all category members and by members of no other categories (Gelman,
2003; Keil, 1989; Medin & Ortony, 1989; Rehder & Kim, 2009a; Rips, 1989) and that are presumed to
generate, or cause, perceptual features. Although many artifacts do not appear to have internal causal
mechanisms (e.g., pencils and wastebaskets), it has been suggested that the essential properties of artifacts
may be the intentions of the their designers (Bloom, 1998; Keil, 1995; Matan & Carey, 2001; Rips, 1989;
cf. Malt, 1994; Malt & Johnson, 1992). Thus, the causal model that people reason with during
categorization may include the underlying causes they assume produce a category’s observable features.
Rehder and Kim (2009b, Experiment 3) tested the importance of the category being essentialized
by comparing the causal structures shown in Figure 4. As in the two preceding experiments, each
category consisted of three observable features related in a causal chain. However, the categories were
now “essentialized” by endowing them with an additional feature that exhibits an important characteristic
of an essence, namely, it appears in all members of the category and in members of no other category. For
example, for myastars the essential property was "ionized helium," and participants were told that all
myastars possess ionized helium and that no other kind of star does.8 In addition, in the EssentializedChain-80 condition (Figure 4A) but not the Unconnected-Chain-80 condition (Figure 4B) participants
were also instructed on a third causal relationship linking feature X to the essential feature (e.g., in
myastars, that ionized helium causes high temperature, where high temperature played the role of X). All
Causal-Based Classification: A Review
24
causal links were presented as probabilistic by describing them as possessing a strength of 80%. A
Control condition in which no causal links are provided was also tested.
After learning the categories, participants performed a classification test that was identical to the
previous two experiments. (In particular, the state of the essential property was not displayed in any test
item.) Linking X, Y, and Z to an essential feature should have two effects on classification ratings. First,
because the link between E and X has a strength of mEX = .80, then the probability of feature X within
category members should be at least .80. This is greater than the value expected in the UnconnectedChain-80 condition on the basis of the first two experiments (in which subjects estimated pk(X) to be a
little over .75; see Figures 3A and 3B). Second, the larger value of pk(X) should produce an enhanced
causal status effect, because the larger value of pk(X) results in a greater drop between it and pk(Y) (and
the larger value of pk(Y) results in a greater drop between it and pk(Z)). These effects are apparent in
Table 4 that presents the generative model’s quantitative predictions for the case when the b parameters
equal .10. (The table also includes predictions for a case, discussed below, where the m parameters are 1.0
instead of .80.) Table 4 confirms that the size of the causal status effect should be larger in the
Essentialized-Chain-80 condition (a difference between pk(X) and pk(Z) of .223) than the UnconnectedChain-80 condition (.189) when the bs = .10; this prediction holds for any value of the bs < 1. In contrast,
the dependency model predicts no difference between the two conditions. Because that model claims that
a feature's centrality is determined by its dependents rather than its causes, providing feature X with an
additional cause in the Essentialized-Chain-80 condition should have no influence on its centrality.
The feature weights derived from subjects’ classification ratings are presented in Figure 2C.
Consistent with the predictions of the generative model, a larger causal status effect obtained in the
Essentialized-Chain-80 condition as compared to the Unconnected-Chain-80 condition; indeed, although
feature weights decreased in the Unconnected-Chain-80 condition, this decrease did not reach
significance. This same pattern was also reflected in feature likelihood ratings (Figure 3C): decreasing
feature likelihoods in the Essentialized-Chain-80 but not the Unconnected-Chain-80 condition.9 As
expected, all feature weights and likelihood ratings were equal in the Control condition.
Other studies have found that essentialized categories lead to an enhanced causal status effect.
Causal-Based Classification: A Review
25
Using the same materials, Rehder (2003b, Experiment 3) found larger a causal status effect with
essentialized categories even when the strength of the causal link was unspecified. And, Ahn and
colleagues have found that expert clinicians both view mental disorders as less essentialized than
laypersons (Ahn et al., 2006) and exhibit only a weak causal status effect (Ahn et al., 2005). These results
show that an essentialized category is a third key variable determining the size of the causal status effect.
However, note that the generative model’s predictions regarding essentialized categories itself
interacts with causal link strength: When links are deterministic, essentializing a category should yield
feature weights that are larger but equal to one another (because each feature in the chain is produced with
the same probability as its parent, namely, 1.0)—that is, no causal status effect should obtain (Table 4).
These predictions were confirmed by Rehder and Kim’s (2009b) Experiment 4, which was identical to
Experiment 3 except that the strengths of the causal links were 100% rather than 80%.
4.4. Number of Dependents
Yet another potential influence on the causal status effect is a feature’s number of dependents.
Rehder and Kim (2006, Experiment 3) assessed this variable by testing the two network topologies in
Figure 5. Participants in both conditions were instructed on categories with five features, but whereas
feature Y had three dependents in the 1-1-3 condition (1 root cause, 1 intermediate cause, 3 effects), it had
only one in the 1-1-1 condition. In this experiment, no information about the causal strengths or
background causes was provided. In the 1-1-1 condition, which feature played the role of Y‘s effect was
balanced between Z1, Z2, and Z3. After learning these causal category structures, subjects were asked to
rate the category membership of all 32 items that could be formed on the five binary dimensions.
The dependency and generative models again make distinct predictions for this experiment.
According to the dependency model, its greater number of dependents in the 1-1-3 condition means that
Y is more central relative to the 1-1-1 condition. Likewise, its greater number of indirect dependents in
the 1-1-3 condition means that X is relatively more central as well. As a result, the dependency model
predicts a larger causal status effect in the 1-1-3 condition than in the 1-1-1 condition. For example,
according to Equation 1, whereas in the 1-1-3 condition feature centralities are 12, 6, and 1 for X, Y, and
the Zs, respectively, after two iterations when cz,1 = 1 and the ds = 2, they are 4, 2, and 1 in the 1-1-1
Causal-Based Classification: A Review
26
condition. The generative model, in contrast, predicts no difference between conditions, because Y having
more effects doesn’t change the chance that it will be generated by its category.
The results of this experiment are presented in Figure 6. (In the figure, the weight for “Z (effect)”
is averaged over Z1, Z2, and Z3 in the 1-1-3 condition and is for the single causally related Z in the 1-1-1
condition. The weights for the “Z (isolated)” features will be discussed later in Section 5.1.) The figure
confirms the predictions of the generative model and disconfirms those of the dependency model. First, as
described earlier, tests of a three element causal chain in this study (the 1-1-1 condition) produced a
relatively small and partial causal status effect (X was weighed more than Y which was weighed the same
as Z). But more importantly, the size of the causal status effect was not larger in the 1-1-3 condition than
the 1-1-1 condition. (Although weights were larger overall in the 1-1-1 condition, this difference was only
marginally significant.) These results show that features’ number of dependents does not increase the size
of the causal status effect.
4.5. Other Factors
In this section I present other factors that have been shown to influence the size of the causal
status effect, factors not directly relevant to the predictions of either the dependency or generative models.
4.5.1. Number of test items: Rehder (2009b). Recall that whereas Ahn et al. (2000a) observed a
difference of 35 points in the rating of the item missing only X versus the one missing only Z, that
difference was an order of magnitude smaller in Rehder and Kim (2006). Besides the difference in the
implied strength of the causal links, these studies also differed in the number of classification test items
presented (3 vs. 32, respectively). As argued in Section 2.2, it is likely that the presence of very likely and
very unlikely category members that anchor the high and low ends of the response scale will decrease the
differences between intermediate items such as those missing one feature (scale contraction) whereas the
absence of extreme items will increase that difference (scale expansion) (Poulton, 1989). In addition,
rating items that differed only on which feature was missing may have triggered a comparison of the
relative importance of those features that wouldn't have occurred otherwise. In other words, a large causal
status effect may have arisen partly because of task demands.
To test these conjectures, Rehder (2009b) replicated the original Ahn et al. (2000a) study but
Causal-Based Classification: A Review
27
manipulated the total number of test items between 3 (those missing just one feature) and 8 (all test items
that can be formed on three binary dimensions). In addition, as an additional test of the role of causal
strength, I compared a condition with the original wording implying a probabilistic relation (e.g., "Sticky
feet tends to allow roobans to build nests on trees.") with one, which implied a deterministic relation (e.g.,
"Sticky feet always allow roobans to build nests on trees."). The procedure was identical to that in Ahn et
al. except that subjects previewed all the test items before rating them.
Figure 7 presents the size of the causal status effect measured by the difference between the
missing-Z and missing-X test items. First note that the causal status effect was larger in the probabilistic
versus the deterministic condition, replicating the findings described earlier in Section 4.1 in which a
stronger causal status effect obtains with weaker causal links (Rehder & Kim, 2009b, Experiment 1). But
the causal status effect was also influenced by the number of test items. For example, whereas the
probabilistic condition replicated the large causal status effect found in Ahn et al. when subjects rated
only 3 test items (a difference of 24 points between missing-Z and missing-X items), it was reduced when
8 items were rated (11 points); in the deterministic condition it was reduced from 5.5 to –3.4. Overall, the
causal status effect reached significance in the probabilistic/3 test item condition only. These results
confirm that scale expansion and/or task demands can magnify the causal status effect when only a small
number of test items are rated.
4.5.2. “Functional” features: Lombrozo (2009). Lombrozo (2009) tested how feature importance
varies depending on whether it is “functional,” that is, whether for a biological species it is an adaptation
that is useful for survival or whether for an artifact it affords some useful purpose. For example,
participants were first told about a type of flower called a holing, that holings have broom compounds in
their stems (feature X) that causes them to bend over as they grow (feature Y). Moreover, they were told
that the bending is useful because it allows pollen to brush onto the fur of field mice. In a Mechanistic
condition, participants were then asked to explain why holings bend over, a question which invites either
a mechanistic response (because broom compounds causes them to) or a functional response (because
bending over is useful for spreading pollen). In contrast, in the Functional condition participants were
then asked what purpose bending over served, a question that invites only a functional response. All
Causal-Based Classification: A Review
28
subjects were then shown two items, one missing X and one missing Y, and asked which was more likely
to be a holing. Whereas subjects chose the missing-Y item 71% of the time in the Mechanistic condition
(i.e., they exhibited a causal status effect), this effect was eliminated (55%) in the Functional condition.
Although the effect size in this study were small (reaching significance at the .05 level only by testing 192
subjects), the potential functions that features afford, a factor closely related to their place in category’s
causal network, may be an important new variable determining their importance to classification.
4.6. Theoretical Implications: Discussion
Together, the reviewed studies paint a reasonably clear picture of the conditions under which a
causal status effect does and does not occur. Generally, what appears to be going on is this. When
confronted with a causal network of features, classifiers will often adopt a "generative" perspective, that
is, they will think about the likelihood of each successive event in the chain. This process may be
equivalent to a kind of mental simulation in which they repeatedly "run" (consciously or unconsciously) a
causal chain. Feature probabilities are then estimated by averaging over runs. Of course, in a run the
likelihood of each subsequent feature in the chain increases as a function of the strength of chain's causal
links. When links are deterministic then all features will be present whenever the chain's root cause is; in
such cases, no causal status effect appears. However, when causal links are probabilistic each feature is
generated with less certainty at every step in the causal chain, and thus a causal status effect arises. But
working against this effect is classifiers' beliefs about the strength of alternative "background" causes.
Background causes will raise the probability of each feature in the causal chain in each simulation run,
and, if sufficiently strong, will cancel out (and possibly even reverse) the causal status effect.
The dependency model, in contrast, is based on a competing intuition, that features are important
in people's conceptual representations to the extent they are responsible for other features (e.g., DNA is
more important than the color of an animal's fur because so much depends on DNA). But despite the
plausibility of this intuition, it is does not conform to subjects’ category membership judgments. Whereas
the dependency model predicts that the causal status effect should be stronger with stronger causal links
and more dependents, it was either weaker or unchanged. And, whereas the dependency model predicts
that features’ weights should be unaffected by the introduction of additional causes, we found instead a
Causal-Based Classification: A Review
29
weaker causal status effect when background causes were present.
It is interesting to consider how classifiers’ default assumptions regarding the causal strengths
and background causes might influence the causal status effect. Recently, Lu, Yuille, Liljeholm, Cheng,
and Holyoak (2008) have proposed a model that explains certain causal learning results by assuming that
people initially assume causal relationships to be sufficient (“strong” in their terms, i.e., the cause always
produces the effect) and necessary (“sparse,” i.e., there are no other causes of the effect) (also see
Lombrozo, 2007; Rehder & Milovanovic, 2007). On one hand, because we have seen how features are
weighed equally when causal links are deterministic, a default assumption of strong causal relationships
works against the causal status effect. On the other hand, that people apparently believe in many
probabilistic causal relations (e.g., smoking only sometimes causes lung cancer) means they can override
this default. When they do, Lu et al.’s second assumption—the presumed absence of background
causes—will work to enhance the causal status effect.
The generative perspective also explains why essentialized categories lead to an enhanced causal
status effect: The presence of an essential features means that observed features should be generated with
greater probability and, so long as causal links are probabilistic, near features (X) should be generated
with relatively greater certainty than far ones (Z). Although Rehder and Kim (2009b) tested the power of
an essential feature to generate a larger causal status effect, note that an underlying feature would produce
that effect even if it was only highly diagnostic of, but not truly essential to, category membership so long
as it was sufficient to increase the probability of the observed features. This prediction is important,
because the question of whether real-world categories are essentialized is a controversial one. Although
good evidence exists for the importance of underlying properties to category membership (Gelman, 2003;
Keil, 1989; Rips, 1989), Hampton (1995) has demonstrated that even when biological categories’ socalled essential properties are unambiguously present (or absent), characteristic features continue to exert
an influence on judgments of category membership (also see Braisby et al., 1996; Kalish, 1995; Malt,
1994; Malt & Johnson, 1992). My own suspicion is that although the unobserved properties of many
categories are distinctly important to category membership, few may be truly essential (see Rehder, 2007,
for discussion). But according to the generative model, all that is required is that the unobserved property
Causal-Based Classification: A Review
30
increase the probability of the observed features.
The causal status effect may be related to essentialism in two other ways. First, whereas I have
described the present results in terms of an essential feature increasing the probability of observed
features, it may also be that subjects engaged in a more explicit form of causal inference in which they
reasoned from the presence of observable features X, Y, and Z to the presence of the unobserved essential
feature (and from the essential feature to category membership). I consider this possibility further in
Section 7. Second, Ahn (1998) and Ahn and Kim (2001) proposed that the causal status effect is itself a
sort of incomplete or weakened version of essentialism. On this account, the root cause X in the causal
chain in Figure 1 becomes more important because it is viewed as essence-like (a “proto-essence” if you
will), although without an essence’s more extreme properties (i.e., always unobservable, a defining
feature that appears in all category members, etc.). Of course, standing as evidence against this principle
are the numerous conditions reviewed above in which a causal status effect failed to obtain. Moreover, the
need for the principle would seem to be obviated by the finding that the causal status effect is fully
explicable in terms of the properties of the category’s causal model, including (a) the strengths of the
causal links and (b) the presence of unobserved (perhaps essential) features which are causally related to
the observed ones.
Another important empirical finding concerns how the changes to features' categorization
importance brought about by causal knowledge is mediated by their subjective category validity (i.e.,
likelihood among category members). In every experimental condition in which classification ratings
revealed a full causal status effect, participants also rated feature X as more likely than Y and Y as more
likely than Z; whenever a causal status effect was absent, features’ likelihood ratings were not
significantly different. Apparently, causal knowledge changes the perceived likelihood with which a
feature is generated by a category's causal model and any feature that occurs with greater probability
among category members (i.e., has greater category validity) should provide greater evidence in favor of
category membership (Rosch & Mervis, 1975). Other studies have shown that a feature’s influence on
categorization judgments correlates with its subjective category validity. For example, although Study 5
of Sloman et al. (1998) found that features’ judged mutability dissociated from their objective category
Causal-Based Classification: A Review
31
validity, they tracked participants’ subjective judgments of category validity.10
In summary, what should be concluded about the causal status effect? On one hand, the causal
status effect does not rise to the level of an unconditional general law, that is, one that holds regardless of
the causal facts involved. Even controlling for the effects of contrast categories, empirical/statistical
information, and the salience of individual features, in the 56 experimental conditions in the 15 studies
reviewed in this chapter testing novel categories, a full causal status effect obtained in 26 of them, a
partial effect (a higher weight on the root cause only, e.g., X > Y = Z) obtained in 7, and there was no
effect (or the effect was reversed) in 23.11 On the other hand, these experiments also suggest that a causal
status effect occurs under conditions that are not especially exotic—specifically, it is promoted by (a)
probabilistic causal links, (b) the absence of alternative causes, (c) essentialized categories, and (d) nonfunctional effect features. Because these conditions are likely to often arise in real-world categories, the
causal status effect is one of the main phenomenon in causal-based classification.
5. Relational Centrality and Multiple Cause Effects
The causal status effect is one important way that causal knowledge changes the importance of
individual features to category membership. However, there are many documented cases in which effect
features are more important than their causes rather than the other way around. For example, Rehder and
Hastie (2001) instructed subjects on the two networks in Figure 8. In the common-cause network, one
category feature (X) is described as causing the three other features (Y1, Y2, and Y3). In the commoneffect network, one feature (Y) is described as being caused by each of the three others (X1, X2, and X3).
(The causes were described as independent, noninteracting causes of Y.) No information about the
strength of the causes or background causes was provided. After learning these causal structures, subjects
were asked to rate the category membership of all 16 items that could be formed on the four binary
dimensions. The regression weights on features averaged over Experiments 1–3 from Rehder and Hastie
(2001) are presented in Figure 9. In the common cause condition, the common cause feature X was
weighed much more heavily than the effects. That is, a strong causal status effect occurred. However, in
the common effect condition the effect feature Y was weighed much more heavily than any of its causes.
That is, a reverse causal status effect occurred. This pattern of feature weights for common cause and
Causal-Based Classification: A Review
32
common effect networks has been found in other experiments (Ahn, 1999; Rehder, 2003a; Rehder &
Kim, 2006; Ahn & Kim, 2001 12).
Two explanations of this effect have been offered. The first, which I refer to as the relational
centrality effect, states that features’ importance to category membership is a function of the number of
causal relations it which it is involved (Rehder & Hastie, 2001). On this account, Y is most important in a
common effect network because it is involved in three causal relations whereas its causes are involved in
only one. The second explanation, the multiple cause effect, states that a feature’s importance increases as
a function of its number of causes (Ahn & Kim, 2001; Rehder & Kim, 2006). On this account, Y is most
important because it has three causes whereas the causes themselves have zero. Note that because neither
of these accounts alone explains the causal status effect (e.g., feature X in Figure 1 has neither the greatest
number of causes nor relations), the claim is that these principles operate in addition to the causal status
effect rather than serving as alternative to it. I first review evidence for and against each of these effects
and then discuss their implications for the dependency and generative models.
5.1. Evidence Against a Relational Centrality Effect and For a Multiple Cause Effect
A study already reviewed in Section 4.4 provides evidence against the relational centrality effect.
Recall that Rehder and Kim (2006, Experiment 3) tested the two causal networks shown in Figure 5.
Feature Y has three effects in the 1-1-3 network but only one in the 1-1-1 network. The results showed
that Y was not relatively more important in the 1-1-3 condition than in the 1-1-1 condition (Figure 6).
These results were interpreted as evidence against the dependency model’s claim that features’
importance increases with their number of effects but they also speak against the claim that importance
increases with their number of relations: Feature Y is involved in four causal relations in the 1-1-3
network but only two in the 1-1-1 network. Feature importance does not appear to generally increase with
the number of relations.
This result implies that the elevated weight on the common effect feature in Figure 9 must be due
instead to it having multiple causes. Accordingly, Rehder and Kim (2006, Experiment 2) tested the
multiple-cause effect by teaching subjects the two causal structures in Figure 10. Participants in both
conditions were instructed on categories with five features, but whereas feature Y had three causes in the
Causal-Based Classification: A Review
33
3-1-1 condition (3 root causes, 1 intermediate cause, 1 effect), it had only one in the 1-1-1 condition. In
the 1-1-1 condition, which feature played the role of Y‘s cause was balanced between X1, X2, and X3 .
After learning these causal category structures, subjects were asked to rate the category membership of all
32 items that could be formed on the five binary dimensions.
The results of this experiment are presented in Figure 11. (In the figure, the weight for “X
(cause)” is averaged over X1, X2 , and X3 in the 3-1-1 condition and is for the single causally related X in
the 1-1-1 condition. The weight for “X (isolated)” is for the isolated Xs in the 1-1-1 condition.) Figure 11
confirms the presence of a multiple-cause effect: Feature Y was weighed relatively more in the 3-1-1
condition when it had three causes versus the 1-1-1 condition when it only had one. These results show
that a feature’s number of causes influences the weight it has on category membership judgments 13.
5.2. Evidence For an Isolated Feature Effect
Although a feature’s weight does not generally increase with its number of relations, there is
substantial evidence showing that features involved in at least one causal relation are more important than
those involved in zero (so-called isolated features). For example, in Rehder and Kim’s (2006),
Experiment 2 just discussed, weights on the isolated features in the 1-1-1 condition (X1 and X3 in Figure
10) were lower than on any of the causally related features (Figure 11). Likewise, in Rehder and Kim’s
Experiment 3, weights on the isolated features in the 1-1-1 condition (Z1 and Z3 in Figure 5) were lower
than on any of the causally related features (Figure 6) (also see Kim & Ahn, 2002a). Of course, one might
attribute this result to the fact that causally related features were mentioned more often during initial
category learning and this repetition may have resulted in those features being treated as more important.
However, even when Kim and Ahn (2002b) controlled for this by relating the isolated features to each
other via non-causal relations, they were still less important than the causally linked features. That
features involved in at least one causal relation are more important than isolated features will be referred
to as the isolated feature effect.
5.3. Theoretical Implications: Discussion
What implications do these findings have for the dependency and generative models? First, the
multiple-cause effect provides additional support for the generative model and against the dependency
Causal-Based Classification: A Review
34
model. Because the dependency model predicts that feature importance varies with the number of
dependents, it predicts no difference between the 3-1-1 and 1-1-1 conditions of Rehder and Kim (2006).
In contrast, it is straightforward to show that the generative model generally predicts a multiple cause
effect. Because demonstrating this quantitatively for the networks in Figures 8 and 10 is cumbersome (it
requires specifying likelihood equations for 16 and 32 separate items, respectively), I do so for a simpler
three-feature common effect network, one in which features X1 and X2 each independently cause Y. The
likelihood equations generated for each item for this simplified network are presented in Table 5 by
iteratively applying Equation 2. For comparison the table also specifies the likelihood equations for a
simplified three-feature common cause structure (X causes Y1 and Y2). The table also presents the
probability of each item for a sample set of parameter values, namely, cX =.75, mXY1 = mXY2 = .75, and bY1
= bY2 =.10 for the common cause network and cX1 = cX2 =.75, mX1Y = mX2Y = .75, and bY = .10 for the
common effect network. From these item distributions, the probability of features of individual features
can be computed. (The predicted feature interactions for these networks, also presented in the table, will
be discussed in Section 6.) First note that, for the common cause network, the generative model predicts a
larger weight on the common cause than the effect features. Second, for the common effect network, it
predicts a weight on the common effect feature, pK(Y) = .828, which is greater than its causes (pK(Xi) =
.750) or the Ys in the common cause network which each have only one cause (pK(Yi) = .606). This
prediction of the generative model corresponds to the simple intuition that an event will be more likely to
the extent that it has many versus few causes.14 These predictions for higher weights on a common cause
and a common effect reproduce the empirical results in Figure 9.
Other research has also shown that an intuitive judgments of an event’s probability will be greater
when multiple potential causes of that event are known. For example, according to Tversky and Koehler’s
(1994) support theory, the subjective probability of an event increases when supporting evidence is
enumerated (death due to cancer, heart disease, or some other natural cause) rather than summarized
(death due to natural causes) (also see Fischoff, Slovic, & Lichtenstein, 1978). And, Rehder and
Milovanovic (2007) found that an event was rated as more probable as its number of causes increased
(from 1 to 2 to 3). Note that Ahn and Kim (2001) also proposed that the multiple cause effect obtained
Causal-Based Classification: A Review
35
with common effect networks was due to the greater subjective category validity associated with common
effect features.15
However, whereas the multiple cause effect provides additional support for the generative model,
the isolated feature effect is problematic for both the generative and the dependency models. For example,
for the 3-1-1 network in Figure 10, the generative model stipulates one free c parameter for each X and, in
the absence of any other information, those cs should be equal. Thus, it predicts that X1 and X3 should
have the same weight as X2. Because they have the same number of dependents (zero), the dependency
model predicts that X1 and X3 should have the same weight as Z.
Why should features be more important because they are involved in one causal relation? Ahn
and Kim (2001) have proposed that this effect is related to Gentner’s (1989) structure mapping theory in
which statements that relate two or more concepts (e.g., the earth revolves around the sun, represented as
revolves-around (earth, sun)) are more important in analogical mapping than statements involving a
single argument (e.g., hot(sun)). Of course, the primary result to be explained is not the importance of
predicates (e.g., revolves-around and hot) but rather the importance of features (that play the role of
arguments in predicates, e.g., causes(X, Y)). But whatever the reason, the isolated feature effect joins the
causal status and multiple cause effects as an important way that causal knowledge influences
classification judgments.
6. The Coherence Effect
The next phenomenon addressed is the coherence effect. Whereas the causal status and multiple
cause effects (and the isolated feature effect) involve the weights of individual features, the coherence
effect reflects an interaction between features. The claim is that better category members are those whose
features appear in sensible or coherent combinations. For example, if two features are causally related,
then one expects the cause and effect feature to usually be both present or both absent. In fact, large
coherence effects have been found in every study in which they’ve been assessed (Rehder & Hastie,
2001; Rehder 2003a; b; Rehder & Kim, 2006; 2009b; Rehder 2007; 2009b; Marsh & Ahn, 2006).
It is important to recognize that effects of causal knowledge on feature weights and feature
interactions are not mutually exclusive. As reviewed in Section 2.2, weights and interactions represent
Causal-Based Classification: A Review
36
two orthogonal effects (corresponding to “main effects” and “interactions” in analysis of variance).
Indeed, some of the studies reviewed below demonstrating coherence effects are the same ones reviewed
in Sections 4 and 5 showing feature weight effects. In other words, causal knowledge changes the
importance of both features and combinations of features to categorization.
In the subsections that follow I review studies testing the generative model’s predictions
regarding how coherence effects are influenced by changes in model parameters (e.g., the strengths of
causal links) and the topology of the causal network. The first two studies demonstrate effects manifested
in terms of two-way interactions between features; the third one also demonstrates an effect on higherorder interactions between features. Recall that the dependency model predicts an effect of causal
knowledge on feature weights but not feature interactions, and so is unable to account for coherence
effects.
6.1. Causal Link Strength
Recall that, according to the generative model, when features are causally related one expects
those features to be correlated. For example, for the three-element chain network (Figure 1), one expects
the two directly causally related feature pairs (X and Y, and Y and Z) to be correlated for most parameter
values, and for the indirectly related pair (X and Z) to be more weakly correlated. Moreover, one expects
these correlations to be influenced by the strength of the causal relations. Table 3 shows the generative
model’s predictions for different causal strengths holding the b parameters fixed at .10. Note two things.
First, Table 3 indicates that the magnitude of the probabilistic contrasts between features increase as mXY
and mYZ increase. The contrasts between directly related features are .300, .675, .810, and .900 for causal
strengths of .33, .75, .90 and 1.0 (Table 3); the contrast between the indirectly related features are .090,
.456, .656, and .810. Intuitively, it makes this prediction because features that are more strongly causally
related should be more strongly correlated.
Second, Table 3 indicates that the difference between the direct and indirect contrasts changes
systematically with strength. It makes this prediction because, although the correlation between directly
related pairs should be stronger than between the indirectly related one for many parameter values, this
difference will decrease as mXY and mYZ approach 1 or 0. For example, when the ms = 1 (and there are no
Causal-Based Classification: A Review
37
background causes), features X, Y, and Z are all perfectly correlated (e.g., Y is present if and only if Z is
and vice versa) and thus there is no difference between direct and indirect contrasts. Likewise, when the
ms = 0, features X, Y, and Z should be perfectly uncorrelated and thus there is again no difference
between direct and indirect contrasts. In other words, the generative model predicts that the direct/indirect
difference should be a nonmonotonic function of causal strength.
These predictions were tested in Experiment 1 of Rehder and Kim (2009b) described earlier in
which the strengths of the causal links were varied between either 100% or 75%. Subjects’ sensitivity to
correlations between features was assessed via regression analyses that included two-way interactions
terms for each pair of features. Regression weights on those interaction terms reflect the sensitivity of
classification ratings to whether each pair of features is both present or both absent versus one present
versus the other absent.
The two-way interaction terms are presented in the right panel of Figure 2A. In the figure, the
weights on the two directly related features (X and Y, and Y and Z) have been collapsed together and are
compared against the single indirectly related pair (X and Z). The first thing to note is that in both the
Chain-100 and the Chain-75 condition both sort of interaction terms were significantly greater than zero.
This reflects the fact that subjects granted items higher category membership ratings to the extent they
were coherent in light of a category’s causal laws (e.g., whether both cause and effect features were both
present and both absent). As expected, in the Control condition both sorts of the two-way interactions
terms (not shown in Figure 2A) were close to zero.
Moreover, the generative model also correctly predicts how the magnitude of the interaction
weights varied over condition (Chain-100 and Chain-75) and type (direct and indirect). First, it was
predicted that the magnitude of the interactions terms should be greater in the Chain-100 versus the
Chain-75 condition. Second, it was predicted that the difference between the direct and indirect terms
should be small or absent in the Chain-100 condition and larger in the Chain-75 condition. In fact, this is
exactly the pattern presented in Figure 2A.16 Causal link strength is one important factor that determines
not only the size of coherence effects but also more subtle aspects of that effect (e.g., the difference
between the direct and indirect terms).
Causal-Based Classification: A Review
38
The effect of coherence in this experiment can also be observed directly in the classification
ratings of individual test items. Figure 12A presents the test item classification ratings as a function of
their number of characteristic features. As expected, in the Control condition ratings were a simple
monotonic function of the number of features. In contrast, items with 2 or 1 features were rated lower
than those with 3 or 0 (i.e., items 111 and 000) in both causal conditions; moreover, this effect was more
pronounced in the Chain-100 condition than the Chain-75 condition. Intuitively, the explanation for these
differences is simple. When Control participants are told, for example, that "most" myastars are very hot,
have high density, and have a large number of planets, they expect that most myastars will have most of
those features and that the atypical values exhibited by "some" myastars (unusually cool temperature, low
density, and small number of planets) will be spread randomly among category members. That is, they
expect the category to exhibit a normal family resemblance structure in which features are independent
(i.e., are uncorrelated within the category). But when those features are causally related, the prototype 111
and item 000 receive the highest ratings. Apparently, rather than expecting a family resemblance structure
with uncorrelated features, participants expected the "most" dimension values to cluster together (111)
and the "some" values to cluster together (000), because that distribution of features is most sensible in
light of the causal relations that link them. As a result, the rating of test item 000 is an average 30 points
higher in the causal conditions than in the Control condition. In contrast, items that are incoherent because
they have 1 or 2 characteristic features (and thus have a mixture of "most" and "some" values) are rated
29 points lower than in the Control condition.
6.2. Background Causes
Experiment 2 of Rehder and Kim (2009b) described earlier also tested how coherence effects
vary with the strength of background causes. Table 3 shows the generative model’s predictions for
different background strengths holding causal strengths (mXY and mYZ) fixed at .75. Note that Table 3
indicates that the magnitude of the probabilistic contrasts between features decrease as bY and bZ increase.
The contrasts between directly related features are .75, .56, and .38 for values of bY and bZ of 0, .25, and
.50, respectively; the contrasts between the indirectly related features are .56, .32, and .14. Intuitively, it
makes this prediction because a cause becomes less strongly correlated with its effect to the extent that
Causal-Based Classification: A Review
39
the effect has alternative causes. In addition, the generative model once again predicts that the direct
correlations should be stronger than the indirect one.
Recall that Rehder and Kim (2009b, Experiment 2) directly manipulated background causes by
describing the strength of those causes as either 0% (Background-0 condition) or 50% (Background-50
condition). Once again, the weights on two-way interaction terms derived from regression analyses
performed on the classification ratings represent the magnitude of the coherence effect in this experiment.
The results, presented in the right panel of Figure 2B, confirm the predictions: The interaction terms were
larger in the Background-0 condition than the Background-50 condition and the direct terms were larger
than the indirect terms.
Once again, the strong effect of coherence is apparent in the pattern of test item ratings shown in
Figure 12B. (The figure includes ratings from the Control condition from Rehder and Kim’s Experiment
1 for comparison.) Whereas in the Control condition ratings are a monotonic function of the number of
characteristic features, in the causal conditions incoherent items with 2 or 1 features are rated lower
relative to the Control condition and the coherent item 000 is rated higher. Apparently, participants
expected category members to reflect the correlations that the causal relations generate: The causallylinked characteristic features should be more likely to appear together in one category member and
atypical features should be more likely to appear together in another. Moreover, Figure 12B shows that
this effect was more pronounced in the Background-0 condition than the Background-50 condition. The
strength of background causes is a second important factor that determines the size of coherence effects.
6.3. Higher-Order Effects
In this section I demonstrate how the generative model predicts not only two-way but also higher
order interactions among features. Consider again the common cause and common effect networks in
Figure 8. Of course, both networks imply that those feature pairs that are directly related by a causal
relation will be correlated. In addition, as in a chain network, the common-cause network implies that the
indirectly related features (the effects Y1, Y2, and Y3) should be correlated for most parameter values (i.e.,
so long as cX < 1, the ms > 0, and the bs < 1) albeit not as strongly as the directly related features (so long
as the ms < 1). The expected correlations between the effects is easy to understand given the inferential
Causal-Based Classification: A Review
40
potency that exists among these features. For example, if in some object you know only about the
presence of Y1 , you can reason from Y1 to the presence of X and then from X to the presence of Y2. This
pattern of correlations is exhibited in Table 5 by the simplified three-feature common cause network:
Contrasts of .675 between the directly related features and .358 between the indirectly related ones.
In contrast, the common effect network implies a different pattern of feature interactions. In an
disanalogy with a common cause network, the common effect network does not predict any correlations
among the independent causes of Y because they are just that (independent). If in some object you know
only about the presence of X1, you can reason from X1 to Y but then the (inferred) presence of Y does not
license an inference to X2 . However, unlike the common cause network, the common effect network
implies higher order interactions among features, namely, between each pair of causes and the common
effect Y. The following example provides the intuition behind these interactions. When the common
effect feature Y is present in an object, that object will of course be a better category member if a cause
feature (e.g., X1) is also present. However, the presence of X1 will be less important when another cause
(e.g., X2) is already present to “explain” the presence of Y. In other words, a common-effect network’s
higher-order interactions reflect the diminishing marginal returns associated with explaining an effect that
is already explained. This pattern of correlations is exhibited in Table 5 by the simplified three-element
common effect network: Contrasts of .295 between the directly related features, 0 between the indirectly
related effects, and higher order contrasts between X1, X2, and Y: $pk (X1, Y, X2) = $pk (X2, Y, X1) = –
.174. These higher-order contrasts reflect the normative behavior of discounting for the case of multiple
sufficient causation during causal attribution (Morris & Larrick, 1995). In contrast, Table 5 shows that the
analogous higher order contrasts for the common cause network, $pk (Y1, X, Y2) and $pk (Y2, X, Y1), are
each 0, reflecting the absence of discounting for that network.
To test these whether these expected higher-order interactions would have the predicted effect on
classification judgments, participants in Rehder (2003a) learned categories with four features and three
causal links arranged in either the common cause or common effect networks shown in Figure 8. No
information about the strength of the causes or background causes was provided. To provide a more
sensitive test of the effect of feature interactions, this study differed from Rehder and Hastie (2001) by
Causal-Based Classification: A Review
41
omitting any information about which features were individually typical of the category. Subjects then
rated the category membership of all 16 items that can be formed on 4 binary dimensions. A control
condition with no causal links was also tested.
The regression weights from this experiment are presented in Figure 13. Note that the feature
weights shown in Figure 13A replicate the feature weights found in Rehder and Hastie (2001): Higher
weights on the common cause and the common effect (Figure 9). More importantly, the feature
interactions show in Figure 13B reflect the pattern of interfeature correlations just described. (In Figure
13B, for the common cause network the “direct” two-way interactions are between X and its effect and
the indirect ones are between the effects themselves; for the common effect network, the “direct”
interactions are between Y and its causes and the indirect ones are between the causes.) First, in both
conditions the two-way interaction terms corresponding to directly causally related feature pairs had
positive weights. Second, in the common-cause condition the indirect two-way interactions were greater
than zero and smaller than the direct interactions, consistent with expectation that the effects will be
correlated with one another but not as strongly as with the cause. Third, in the common effect network the
indirect terms were not significantly greater than zero, consistent with the absence of correlations between
the causes. Finally, that the average of the three three-way interactions involving Y (i.e., fX1X2Y, fX1X3Y,
fX2X3Y) was significantly negative in the common effect condition reflects the higher-order interactions
that that structure is expected to generate (Table 5). This interaction is also depicted in the right panel of
Figure 13C that presents the logarithm of categorization ratings in the common-effect condition for those
exemplars in which the common effect is present as a function of the number of cause features. The figure
shows the predicted nonlinear increase in category membership ratings as the number of cause features
present to “explain” the common effect feature increases. In contrast, for the common cause network
(Figure 13C, left panel), ratings increased linearly (in log units) with the number of additional effects
present. (All two-way and higher-order interactions were close to zero in the control condition.) These
results indicate that, consistent with the predictions of the generative model, subjects expect good
category members to manifest the two-way and higher-order correlations that causal laws generate.
Causal-Based Classification: A Review
42
6.4. Other Factors
Finally, just as was the case for the causal status effect, questions have been raised about the
robustness of the coherence effect. Note that in early demonstrations of this effect, one value on each
feature dimension was described as characteristic or typical of the category whereas the other, atypical
value was often described as "normal" (Rehder & Hastie, 2001; Rehder 2003a; b; Rehder and Kim, 2006).
For example, in Rehder and Kim (2006) participants were told that "Most myastars have high temperature
whereas some have a normal temperature," "Most myastars have high density whereas some have a
normal density," and so on. Although the intent was to define myastars with respect to the superordinate
category (all stars), Marsh and Ahn (2006) suggested that this use of "normal" might have inflated
coherence effects because participants might expect all the normal dimension values to appear together
and, because of the emphasis on coherence, reduced the causal status effect.
To assess this hypothesis, Marsh and Ahn taught participants categories with four features
connected in a number of different network topologies. For each, they compared an Unambiguous
condition in which the uncharacteristic value on each binary dimension was the opposite of the
characteristic value (e.g., low density vs. high density) with an Ambiguous condition (intended to be
replications of conditions from Rehder 2003a and 2003b) in which uncharacteristic values were described
as "normal" (e.g., normal density). They found that the Unambiguous condition yielded a larger causal
status effect and a smaller coherence effect, a result they interpreted as demonstrating that the "normal"
wording exaggerates coherence effects. However, this conclusion was unwarranted because the two
conditions also differed on another dimension, namely, only the Unambiguous participants were given
information about which features were typical of the category. In the absence of such information it is
unsurprising that ratings in the Ambiguous condition were dominated by coherence.
To determine whether use of “normal” affects classification, Rehder and Kim (2008) tested
categories with four features arranged in a causal chain (W!X!Y!Z) and compared two conditions
that were identical except that one used the "normal" wording and the other used bipolar dimensions (e.g.,
low vs. high density). The results, presented in Figure 14, show a pattern of results exactly the opposite of
the Marsh and Ahn conjecture: The "normal" wording produced a smaller coherence effect and a larger
Causal-Based Classification: A Review
43
causal status effect. Note that large coherence effects were also found in each of the four experiments
from Rehder and Kim (2009b) reviewed above that also avoided use of the “normal” wording.
Why might bipolar dimensions lead to stronger coherence effects? One possibility is that
participants might infer the existence of additional causal links. For example, if you are told that myastars
have either high or low temperature and either high or low density, and that high temperature causes high
density, you might take this to mean that low temperature also causes low density. These results suggest
that subtle differences in the wording of a causal relation can have big effects in how those links are
encoded and then used in a subsequent reasoning task. But however one interprets them, these findings
indicate that coherence effects do not depend on use of the “normal” wording.
6.5. Theoretical Implications: Discussion
Causal networks imply the existence of subtle patterns of correlations between variables: Directly
related variables should be correlated, indirectly related variables should be correlated under specific
conditions, and certain networks imply higher-order interactions among variables. The studies just
reviewed show that people’s classification judgments are exquisitely sensitive to those correlations. These
results provide strong support the generative model’s claim that good category members are those that
manifest the expected pattern of interfeature correlations and poor category members are those that
violate that pattern.
As mentioned, the presence of coherence effects supports the generative model over the
dependency model because only the former model predicts feature interactions. However, one model that
predicts feature interactions is Rehder and Murphy’s (2003) KRES recurrent connectionist model that
represents relations between category features as excitatory and inhibitory links (also see Harris &
Rehder, 2006). KRES predicts interactions because features that are consistent with each other in light of
knowledge will raise each other’s activation level (due to the excitatory links between them) which in turn
will activate a category label more strongly; inconsistent features will inhibit one another (due to the
inhibitory links between them) which will result in a less active category label. But while KRES accounts
for a number of known effects of knowledge on category learning, because its excitatory and inhibitory
links are symmetric, KRES is fundamentally unable to account for the effects reviewed above
Causal-Based Classification: A Review
44
demonstrating that subjects treat causal links as an asymmetric relation. For example, if one ignores
causal direction, X and Z in the causal chain in Figure 1 are indistinguishable (and thus there is no basis
for predicting a causal status effect) as are the common cause and common effect networks in Figure 9
(and thus there is no basis for predicting the different pattern of feature interactions for those networks).
Indeed, the asymmetry between common-cause and common-effect networks has been the focus of
considerable investigation in both the philosophical and psychological literatures (Reichenbach, 1956;
Salmon, 1984; Waldmann & Holyoak, 1992; Waldmann et al., 1995).
The importance of coherence to classification has been documented by numerous other studies.
For example, Wisniewski (1995) found that certain artifacts were better examples of the category
“captures animals” when they possessed certain combinations of features (e.g., “contains peanuts” and
“caught a squirrel”) but not others (“contains acorns” and “caught an elephant”) (also see Murphy &
Wisniewski, 1989). Similarly, Rehder and Ross (2001) showed that artifacts were considered better
examples of a category of pollution cleaning devices when their features cohered (e.g., “has a metal pole
with a sharpened end” and “works to gather discarded paper”), and worse examples when their features
were incoherent (“has a magnet” and “removes mosquitoes”). Malt and Smith (1984) found that
judgments of typicality in natural categories were sensitive to whether items obeyed or violated
theoretically-expected correlations (also see Ahn et al., 2002). Coherence also affects other types of
category-related judgments. Rehder and Hastie (2004) found that participants’ willingness to generalize a
novel property displayed by an exemplar to an entire category varied as a function of the exemplar’s
coherence. Patalano and Ross (2007) found that the generalization strength of a novel property from some
category members to another varied as a function of the category’s overall coherence (and found the
reverse pattern when the generalization was made to a non-category member).
Finally, it is illuminating to compare the relative importance of the effects of causal knowledge
on features weights (i.e., the causal status, multiple cause, and relational centrality effects) and feature
interactions (the coherence effect) by comparing the proportion of the variance in categorization ratings
attributable to the two types of effects. In this calculation, the total variance induced by causal knowledge
was taken to be the additional variance explained by a regression model with separate predictors for each
Causal-Based Classification: A Review
45
feature and each two-way and higher-order interaction as compared to a model with only one predictor
representing the total number of characteristic features in a test item. The variance attributable to the
changes in feature weights is the additional variance explained by the separate predictors for each feature
in the full model, whereas that attributable to the coherence effect is the additional variance explained by
the interaction terms. In fact, coherence accounts for more variance in categorization judgments than
feature weights in every study in which coherence has been assessed: 60% in Rehder and Hastie (2001,
Experiment 2), 80% in Rehder (2003a), 82% in Rehder (2003b, Experiment 1), 70% in Rehder and Kim
(2006), 64% in Marsh and Ahn (2006), and over 90% in Rehder and Kim (2009b). These analyses
indicate that the most important factor that categorizers consider when using causal laws to classify is
whether an object displays a configuration of features that make sense in light of those laws.
7. Classification as Explicit Causal Reasoning
The final phenomenon I discuss concerns evidence of how categorization can sometimes be an
act of causal reasoning. On this account, classifiers treat the features of an object as evidence for the
presence of unobserved features and these inferred features then contribute to a category membership
decision.
Causal reasoning such as this may have contributed to instances of the causal status effect
described in Section 4. For example, recall that Rehder and Kim (2009b, Experiment 3) found an
enhanced causal status effect when subjects were instructed on categories with an explicit “essential”
feature (Figure 4A). Although we interpreted those findings in terms of how the essential feature changed
the likelihoods of the observed features (and provided evidence for this claim, see Figure 5C), subjects
may have also reasoned backwards from the observed features to the essential one, and, of course,
features closer (in a causal sense) to the essence (e.g., X) were taken to be more diagnostic of the essence
than far ones (e.g., Z). Reasoning of this sort may occur even when participants are not explicitly
instructed on an essential feature. For example, one of the categories used in Ahn et al. (2000a) was a
disease that was described as having three symptoms X, Y, and Z. Although participants were told that
X!Y!Z, people understand that a disease (D) causes its symptoms, and so participants were likely to
have assumed the more complex causal model D!X!Y!Z (and then reasoned backwards from the
Causal-Based Classification: A Review
46
symptoms to the disease). Given the prevalence of essentialist intuitions (Gelman, 2003), similar
reasoning may have occurred for the natural kinds and artifacts tested in many studies. I now review
recent studies that provide more direct evidence of causal reasoning during classification.
7.1. Classification as Diagnostic (Backward) Reasoning
Rehder and Kim (2009a, Experiment 1) investigated the causal inferences people make in the
service of classification by teaching subjects the causal structures in Figure 15A. Unlike the studies
reviewed above, subjects were taught two novel categories (e.g., myastars and terrastars). Category A
had three features, one underlying feature (UA) and two observable features (A1 and A2). The first
observable feature (A1) was described as being caused by UA but the second (A2) was not. Likewise,
category B had one underlying feature (UB) that caused the second observable feature (B2) but not the first
(B1). Like the pseudo-essential feature in Rehder and Kim (2009b, Experiment 3) in Figure 4A, UA and
UB were defining because they were described as occurring in all members of their respective category
and no nonmembers. Observable features were associated with their category by stating that they
occurred in 75% of category members.
After learning about the two categories, participants were presented with test items consisting of
two features, one from each category, and asked which category the item belonged to. For example, a test
item might have features A1 and B1, which we predicted would be classified as an A, because from A1 one
can reason to UA via the causal link that connects them, but one cannot so reason from B1 to UB. For a
similar reason, an item with features A2 and B2 should be classified a B. Consistent with this prediction,
subjects chose the category whose underlying feature was implicated by the observables ones 84% of the
time. Moreover, when subjects were presented with items in which the presence of two features was
negated (e.g., A1 and B1 both absent), they chose the category whose underlying feature could be inferred
as absent (e.g., category A) only 32% of the time. That is, subjects appeared to reason from observable
features to underlying ones and then category membership.
There are alternative interpretations of these results however. In Figure 15A, feature A1 might
have been viewed as more important because it was involved in one relation versus B1 which was
involved in zero (i.e., an isolated features effect; see Section 5.2). To address this concern, Rehder and
Causal-Based Classification: A Review
47
Kim (2009a) conducted a number of follow-up experiments. Our Experiment 3 tested the categories in
Figure 15B in which the strength of the causal relations between UA and A1, and UA and A2, was
described as 90% and 60%, respectively, whereas those between UB and B1, and UB and B2, was
described as 60% and 90%. We predicted that test item A1B1 would be classified as an A, because the
inference from A1 to UA is more certain than the inference from B1 to UB. Consistent with this prediction,
subjects classified test item A1B1 as an A 88% of the time. Experiment 4 tested the categories in Figure
15C. Whereas UA was described as occurring in all category A members just as in the previous
Experiments 1-3, UB was described as occurring in only 75% of category B members. We predicted that
whereas the observable features of both categories provide equal evidence for UA and UB, respectively,
those of category A should be more diagnostic because UA itself is. Consistent with this prediction, test
item A1B1 was classified as an A 68% of the time. Finally, Experiment 5 tested the category structures in
Figure 15D. Unlike the previous experiments, participants were given explicit information about the
possibility of alternative causes of the observable features; specifically, they were told that features A1
and B2 had alternative causes (that operated with probability 50%) whereas A2 and B1 had none. We
predicted test items A1B1 should be classified as a B because B1 provides decisive evidence of UB
(because it has no other causes). As predicted, test item A1B1 was classified a B 73% of the time.
Importantly, these results obtained despite the fact that the more diagnostic feature was involved in either
the same number (Experiments 3 and 4) or fewer (Experiment 5) causal relations.
Recent evidence suggests that people also reason diagnostically to underlying properties for
natural categories. In a replication of Rips’s (1989) well known transformation experiments, Hampton et
al. (2007) found that whether a transformed animal (e.g., a bird that comes to look like an insect due to
exposure to hazardous chemicals) was judged to have changed category membership often depended on
what participants inferred about underlying causal processes and structures. As in Rips’s study, a (small)
majority of subjects in Hampton et al. judged the transformed animal to still be a bird whereas a (large)
minority judged that it was now an insect. But although the judgments of the latter group (dubbed the
phenomenalists by Hampton et al.) would seem to be based on the animals’ appearance, the justifications
they provided for their choices indicated instead that many used the animals’ new properties to infer
Causal-Based Classification: A Review
48
deeper changes. For example, subjects assumed that a giraffe that lost its long neck also exhibited new
behaviors that were driven by internal changes (e.g., to its nervous system) which in turn signaled a
change in category membership (to a camel). Conversely, those subjects who judged that the transformed
animal’s category was unchanged (the essentialists) often appealed to the fact that it produced offspring
from its original category, from which they inferred the absence of important internal changes (e.g., to the
animal’s DNA). In other words, rather than the (so-called) phenomenalists using only observable features,
and rather than the essentialists just relying on the presence of previously-inferred underlying properties,
both groups used observable features to infer the state of internal causal structures and processes, and
decided category membership on that basis.
Finally, also recall Murphy and Medin’s (1985) well-known example of classifying a party-goer
who jumps into a pool as drunk—one reasons from aberrant behavior to its underlying cause even if one
has never before observed a swimming drunk.
7.2. Classification as Prospective (Forward) Reasoning
The notion of explicit causal reasoning in the service of classification allows for not only
backwards, or diagnostic, reasoning to underlying features but also forwards, or prospective, reasoning.
For example, a physician may suspect the presence of HIV given the presence of the forms of sarcoma,
lymphoma, and pneumonia that HIV is know to produce (diagnostic reasoning). But the case for HIV is
made stronger still by the presence of one or more of its known causes, such as blood transfusions,
sharing of intravenous needles, or unsafe sex (prospective reasoning). I now review evidence of
prospective reasoning in classification.
7.2.1. Rehder (2007). Subjects were taught the common cause and common effect networks in
Figure 8, but now the common cause and common effect were pseudo-essential underlying features, that
is, they occurred in all category members and no nonmembers. The classification test only included items
with three observable features (the effect features in the common cause network or the cause features in
the common effect network). Rehder (2007) showed that an object's degree of category membership
increased nonlinearly with its number of observable features when those features were effects as
compared to the linear increase that obtained when those features were causes, results consistent with a
Causal-Based Classification: A Review
49
normative account of causal reasoning (also see Oppenheimer & Tenenbaum, 2009).
7.2.2. Follow-up to Rehder & Kim (2009a). In a follow-up experiment to Rehder and Kim
(2009a), our lab taught subjects that two category structures in Figure 16A. Participants were again
presented with test items consisting of one feature from each category (e.g., A1 B1). Item A1B1 was
classified as an A, suggesting that the evidence that A1 provided for category A via forward causal
reasoning was stronger than the evidence that B1 provided for category B. This result is consistent with a
well-known property of causal reasoning, namely, the fact that people reason more confidently from
causes to effects than vice versa (Tversky & Kahneman, 1980).
7.2.3. Chaigneau, Barsalou, & Sloman (2004). Chaigneau et al. provided particularly compelling
evidence of the presence of prospective causal reasoning in categorization. Figure 16B presents the causal
structures they hypothesized constitute the mental representation of artifact categories. The function
historically intended by the artifact’s designer results in its physical structure. In addition, the goal of a
particular agent leads to the agent acting toward the artifact in a way to achieve those goals. Together, the
artifact’s physical structure and the agent’s action yield a particular outcome. The authors presented
subjects with a number of vignettes that each specified the state of the four causes in which three of the
causes were present and the fourth was absent. For example, a vignette might include (a) an object that
was created with the intention of being a mop but (b) was made out of plastic bags, and (c) an agent that
wanted to clean up a spill and that (d) used the object in the appropriate way for mopping (i.e., all causes
normal for a mop were present except physical structure). Subjects were asked how appropriate it was to
call the object a mop. Classification ratings were much lower for vignettes in which the appropriate
physical structure was missing as when the appropriate historical intention was missing. This result is
consistent with subjects reasoning from an object’s physical structure to its potential function and then to
category membership—so long as the structure of the artifact is appropriate, the intention of its designer
becomes irrelevant.17 However, when physical structure is unspecified, then one can use the designer’s
intentions to infer physical structure (and from structure infer potential function). This is what Chaigneau
et al. found: The effect of a missing intention on classification was much larger when the physical
structure was unspecified.
Causal-Based Classification: A Review
50
7.4. Theoretical Implications: Discussion
What implications do these findings have for the generative and dependency models? First, there
are two ways that the generative model can account for the observed results. The first approach
corresponds to the explicit causal reasoning account we have just described. As a type of a causal
graphical model, a category’s network of interfeature causal links support the elementary causal
inferences required to account for the results in Experiments 1–5. Indeed, Rehder and Burnett (2005)
confirmed that people are more likely to infer the presence of a cause feature when it’s effect was present
(and vice versa). Although Rehder and Burnett also observed some discrepancies from normative
reasoning, current evidence indicates that people can readily engage in the causal reasoning from
observed to unobserved features suggested by these experiments (also see Ahn et al., 2000a, Experiment
5; Sloman & Lagnado, 2005; Waldmann & Hagmayer, 2005).
In addition, recall that the generative model also predicts that observed features caused by
underlying properties are likely to be perceived as more prevalent among category members. Not only
does this multiple cause effect help explain the enhanced causal status effect found with Rehder and
Kim’s (2009b, Experiment 3) essentialized categories, Rehder and Kim (2009a) have shown how it
explains the results from all five of their experiments described above. However, it does not explain the
cases of prospective causal reasoning. For example, in Figure 16A feature B1 should have greater
category validity in category B than A1 has in category A but A1 was the more diagnostic feature. Thus,
demonstrations of prospective reasoning are important insofar as they establish the presence of
classification effects not mediated by changes in feature likelihoods brought about by the multiple cause
effect, implying a more explicit form of causal reasoning.
Second, the dependency model in turn can explain some aspects of the prospective reasoning
results, as features should be weighed more heavily when they have an extra effect. Thus, it explains why
feature A1 is more diagnostic than B1 in Figure 16A. However, the dependency model fails to explain the
results from Chaigneau et al. (2004) in which the importance of a distal cause (the intentions of an
artifact’s designer) itself interacts with whether information about the artifact’s physical structure is
available. Of course, the dependency model is also unable to account for the cases of diagnostic reasoning
Causal-Based Classification: A Review
51
in which features become more important to the extent they have additional causes rather than effects.
8. Developmental Studies
Given the robust causal-based classification effects in adults just reviewed, it is unsurprising that
researchers have asked when these effects develops in children. I review evidence of how causal
knowledge changes the importance of features and feature interactions and evidence of explicit causal
reasoning in children.
8.1. Feature Weights and Interactions
Initial studies of how causal knowledge affects children’s classification were designed to test
whether children exhibit a causal status effect. For example, Ahn, Gelman, Amsterlaw, Hohenstein and
Kalish (2000b) taught 7- to 9-year-olds a novel category with three features in which one feature was the
cause of the other two. They told children, for example, that a fictitious animal called taliboos had
promicin in their nerves, thick bones, and large eyes, and that the thick bones and large eyes were caused
by the promicin. Ahn et al. found that an animal missing only the cause feature (promicin) was chosen to
be a more likely category member than one missing only one of the effect features (thick bones or large
eyes). Using a related design, Meunier and Cordier (2009) found a similar effect with 5-year-olds (albeit
only when the cause was an internal rather than a surface feature).
However, although the authors of these studies interpreted their findings as indicating a causal
status effect, I have shown in Section 2.2 how these results can be interpreted as reflecting a coherence
effect instead (see Example 1 in Table 1). An item missing only the cause feature (promicin) may have
been rated a poor category member because it violated two expected correlations (one with thick bones,
the other with large eyes) whereas an item missing only one effect feature violated only one expected
correlation (the one with promicin). Thus, the Ahn et al. and Meunier and Cordier results are ambiguous
regarding whether children exhibit a causal status effect or a coherence effect (or both).
To test for the independent presence of these effects, Hayes and Rehder (2009) taught 5-6 year
olds a novel category with four features, two of which were causally related. For example, children were
told about a novel animal named rogos that have big lungs, can stay underwater for a long time, have long
ears, and sleep during the day. They were also told that having big lungs was the cause of staying
Causal-Based Classification: A Review
52
underwater for a long time (the other two features were isolated, i.e., were involved in no causal links).
After category learning, subjects were presented with a series of trials presenting two animals and asked
which was more likely to be a rogo. The seven test pairs are presented in Table 6. For each alternative,
dimension 1 is the cause, dimension 2 is the effect, and dimensions 3 and 4 are the neutral features; ‘1’
means a feature is present, ‘0’ means it is absent, ‘x’ means there was no information about the feature
(e.g., item 10xx is an animal with big lungs that can’t stay underwater very long, with no information
about the two neutral features). A group of adults was also tested on this task.
Table 6 presents the proportion of times alternative X was chosen in each test pair for the two
groups of subjects. To analyse these data, we performed logistic regression according to the equation
choicek (X, Y)
= 1 / (1 + exp(–[diffk(X, Y)]))
(8)
where diffk is defined as the difference in the evidence that alternative X and Y provide for category k,
diffk (X, Y)
= evidencek(X) – evidencek(Y)
= (wc fX,1 + we fX,2 + wn fX,3 + wn fX,4 + whhX) –
(wc fY,1 + we fY,2 + wn fY,3 + wn fY,4 + whhY)
= wc(fX,1 – fY,1) + we(fX,2 – fY,2) + w3(fX,3 – fY,3) + w4(fX,4 – f Y,4) +
wh(hX – hY)
= wcmXY,1 + wemXY,2 + wn mXY,3 + wn mXY,4 + wh mXY,h
(9)
In Equation 9, fi,j is an indicator variable reflecting whether the feature on dimension j in alternative i is
present (+1), absent (–1), or unknown (0), and hi indicates whether i is coherent (+1), incoherent (–1), or
neither (0). Thus, mXY,j (= fX,j – fY,j) are match variables indicating whether alternatives X and Y match on
dimension j. In addition, wc, we, and wn are the evidentiary weights provided by the cause feature, effect
feature, and the neutral features, respectively. That is, a single item’s degree of category membership is
increased by wi if a feature on dimension i is present and decreased by wi if it is absent. Finally, wh is
defined as the weight associated with whether the object exhibits coherence: An object’s degree of
category member is increased by wh if the cause and effect features are both present or both absent and
decreased by wh if one is present and the other absent. Note that Equation 8 predicts a choice probability
in favor of X of close to 1 when diffk (X, Y) >> 0, close to 0 when diffk (X, Y) << 0, and close to .5 when
Causal-Based Classification: A Review
53
diffk (X, Y) % 0.
The values of wc, we, wn, and wh yielded by the logistic regression analysis averaged over subjects
are presented in Table 7. First, note that a causal status effect—the difference between the importance of
the cause and effect features—is reflected in the difference between parameters wc and we. In fact, this
difference was not significantly different than zero for either children or adults (Table 7), indicating that
neither group exhibited a causal status effect. The absence of a causal status effect is reflected in test pair
C in which alternative 10xx (which has the cause but not the effect) was not considered a more likely
category member than alternative 01xx (which has the effect but not the cause). Second, an isolated
features effect—measured by the difference between the average of the cause and effect features (wc and
we) and the neutral features (wn)—was significantly greater than zero for both groups. Finally, both
groups exhibited a coherence effect, as indicated by values of wh that were significantly greater than zero.
In addition, the effect of coherence was larger in adults than in children (wh = .65 vs. .22).
These results have several implications. First, besides replicating the presence of a coherence
effect in adults (Section 6), this study is the first to document a coherence effect in 5–6 year old children
(albeit one smaller in magnitude than in adults). Second, the isolated features effect similarly replicates
adult findings reviewed earlier (Section 5) and extends those findings to children. Third, this study
replicates the numerous adult studies reported in Section 4 in which a causal status effect failed to obtain
and shows that this effect is also not inevitable in children. Of course, the finding of a coherence effect
but not a causal status effect in children supports the possibility that previous reports of a causal status
effect in children (e.g., Ahn et al. 2000b; Meunier & Cordier, 2009) reflected an effect of coherence
instead.18
This preliminary study leaves several questions unanswered. One concerns whether a causal
status effect fails to obtain in children for the same reasons it does adults. For example, adult studies
described above showed no causal status effect with deterministic links. Thus, because Rehder and Hayes
did not specify the strength of the causal relation between the cause and effect feature, the absence of a
causal status effect may have been due to adults and children interpreting that link as deterministic (e.g.,
large lungs always allows roobans to stay underwater a long time). Accordingly, new studies are being
Causal-Based Classification: A Review
54
planned that use the same materials and procedure but test probabilistic causal links.
This study also makes an important methodological point, namely, how the separate effects of
causal status, isolated features, and coherence can be evaluated in a forced-choice paradigm using logistic
regression. Just as linear regression does for rating scale data, logistic regression provides the means to
separate the multiple effects of causal knowledge on categorization, including the importance of both
features and feature combinations.
8.2. Explicit Causal Reasoning in Children’s Categorization
There is considerable evidence that the explicit causal reasoning observed in adult categorization
is also exhibited by children. For example, in a series of studies Gopnik, Sobel, and colleagues have
shown that children can reason causally from observed evidence to an unobserved feature that determines
category membership (Gopnik & Sobel, 2000; Gopnik, Glymour, Sobel, Schulz, & Kushnir, 2004; Sobel,
Tenenbaum, & Gopnik, 2004; Sobel & Kirkham, 2006; Sobel, Yoachim, Gopnik, Meltoff, & Blumenthal,
2007). In these studies, children are shown a device called a blicket detector and told that it activates (i.e.,
plays music) whenever blickets are placed on it. They then observe a series of trials in which (usually
two) objects are placed on the blicket detector either individually or together after which the machine
either does or does not activate. For example, in a backward blocking paradigm tested in Sobel et al.
(2004, Experiment 1), two blocks (A and B) were twice placed on the machine causing it to activate
followed by a third trial in which A alone caused activation. On a subsequent classification test, 3- and 4year-olds affirmed that B was blicket with probability .50 and .13, respectively, despite that the machine
activated on every trial in which B was present; these probabilities were 1.0 in an indirect screening off
condition that was identical except that the machine didn’t activate on the final A trial. Apparently,
children were able to integrate information from the three trials to infer whether B had the defining
property of blickets (the propensity to activate the blicket detector). In particular, in the backward
blocking condition they engaged in a form of discounting in which the trial in which A alone activated the
machine was sufficient to discount evidence that B was a blicket. Sobel and Kirkham (2006) reached
similar conclusions for 24-month olds (and 8-month olds using anticipatory eye movements as a
dependent measure). The full pattern of results from these studies has no interpretation under alternative
Causal-Based Classification: A Review
55
associative learning theories. Moreover, Kushnir and Gopnik (2005) have shown that children can infer
category membership on the basis of another type of causal reasoning, namely, on the basis of
interventions (in which the subject rather than the experimenter places blocks on the machine). Notably,
Booth (2007) has shown how these sorts of causal inferences in the service of classification result in
children learning more about a category’s noncausal features.
9. Summary and Future Directions
This chapter has demonstrated three basic types of effect of causal knowledge on classification.
First, causal knowledge changes the importance of individual feature. A feature’s importance to category
membership can increase to the extent that it is “more causal” (a causal status effect), it has many causes
(a multiple cause effect), and is involved in at least one causal relation (an isolated feature effect).
Second, causal knowledge affects which combinations of features make for good category members,
namely, those that manifest the interfeature correlations expected to be generated by causal laws (a
coherence effect). These expected combinations include both pairwise feature correlations and higherorder interactions among features. Finally, causal knowledge supports the inferences from the features of
an object one can observe to those that can’t, which in turn influence a category membership decision.
Evidence was reviewed indicating how these inferences can occur in both the backward (diagnostic)
direction as well as the forward (prospective) direction. Several of these effects have been demonstrated
in young children.
This chapter has also discussed the implications these results have for current models of causalbased classification. Briefly put, the generative model accounts for vastly more of the results obtained
testing experimental categories than the alternative dependency model. As shown, the generative model
correctly predicts how the magnitude of the causal status effect varies with (a) causal strength, (b) the
strengths of background causes, and (c) the presence of unobserved “essential” features. It also accounts
for the multiple cause effect. It accounts for the coherence effect, including the observed higher-order
interactions, and how the magnitudes of the two-way interactions vary with experimental conditions (e.g.,
causal strength) and type (directly related feature pairs versus indirectly related ones). Finally, by
assuming that causal category knowledge is represented as a graphical model, it supports the diagnostic
Causal-Based Classification: A Review
56
and prospective causal reasoning from observed features to unobserved ones. The dependency model, in
contrast, is unable to account for any of these phenomena. Nevertheless, note that there was one failed
prediction of the generative model, namely, the presence of the isolated feature effect. Of course, the
dependency model is also unable to account for this effect.
In the final subsections below, I briefly present other issues and directions for future research.
9.1. Alternative Causal Structures and Uncertain Causal Models
The experimental studies reviewed here have involved only one particular sort of causal link,
namely, a generative cause between two binary features. However, one’s database of causal knowledge
includes many other sorts of relations. Some causal relations are inhibitory in that the presence of one
variable decreases the probability of another. Conjunctive causes obtain when multiple variables are each
required to produce an effect (as fuel, oxygen, and spark are all needed for fire). Binary features involved
in causal relations can be additive (present or absent) or substitutive (e.g., male vs. female) (Tversky &
Gati, 1982); in addition, variables can be ordinal or continuous (Waldmann et al., 1995). There are
straightforward extensions to the generative model to address these possibilities, but as yet few empirical
studies have assessed their effect on classification.
Another important possibility is cases in which variables are related in causal cycles; indeed,
many cycles were revealed by the theory-drawing tasks used in Sloman et al.’s (1998) and Kim and Ahn’s
(2002a; b) studies of natural categories. Kim et al. (2009) has tested the effect of causal cycles in novel
categories and proposed a variant of the dependency model that accounts for their result by assuming that
causal cycles are unraveled one time cycle (e.g., the causal model {X&Y} is replaced by {Xt!Yt+1,
Yt!Xt+1}, where t represents time). They also note how a similar technique can allow the generative
model to be applied to causal cycles. Still, Kim et al.’s use of the missing feature method prohibited an
assessment of coherence effects or the weight of individual features in a manner that is independent of
feature interactions. Thus, there is still much to learn about how causal cycles affect classification.
Finally, an important property of causal models is the representation of uncertainty. You may
believe that 100% of roobans have sticky feet and that 75% of myastars have high density, but your
confidence in these beliefs may be either low or high depending on their source (e.g., they may be based
Causal-Based Classification: A Review
57
on a small and biased sample or a large number of observations; they may come from a reliable or
unreliable individual, etc.). Your confidence in interfeature causal relations (e.g., the belief that sticky feet
are caused by roobans eating fruit) can similarly vary on a continuum. There are known ways to represent
causal model parameters as subjective probability density functions and learn and reason with those
models (Griffiths & Tenenbaum, 2005; Lu et al. 2008) that have obvious extensions to classification, but
again there have been few empirical studies that have examined these issues. One exception is Ahn et al.
(2000) who taught subjects causal relations that were implausible (because they contrasted with prior
knowledge) and found, not surprisingly, no causal status effect.
9.2. Categories’ Hidden Causal Structure
As mentioned, the purpose of testing novel rather than real-world categories is that it affords
greater experimental control over the theoretical/causal knowledge that people bring to the classification
task. Nevertheless, even for a novel category people may assume additional causal structure on the basis
of its ontological kind (e.g., whether it is an artifact or biological kind). Research reviewed above has
shown how underlying “essential” features can affect classification even when they’re not observed in test
items (Rehder & Kim, 2009a; Hampton et al. 2007). And, the results of Chaigneau et al. (2004) suggests
that classifiers can engage in prospective causal reasoning to infer an unobserved feature (an artifact’s
potential function) to decide category membership. Ahn and Kim (2001) have referred to systematic
differences between domains as “content effects,” and I would expand this notion to include the sort of
default causal models that people assume in each domain. Thus, continuing to elucidate the sorts of
default hidden causal structures that are associated with the various ontological kinds (and which of those
causal structures are treated as decisive for category membership) and investigating how that those
structures influence real-world categorization remains a key aim of future research.
9.3. Causal Reasoning and the Boundaries of Causal Models
That classifiers can infer hidden causal structure raises the questions about the sorts of variables
that can contribute to those inferences. I have reviewed studies showing that people can causally infer
unobserved features from observed ones, but nothing prevents inferences involving variables not
normally be considered “features” of the category (Oppenheimer et al., 2009). For example, you may be
Causal-Based Classification: A Review
58
unable to identify the insects in your basement, until you see your house’s damaged wooden beams and
realize you have termites. But is the chewed wood a “feature” of termites? Or, to take a more fanciful
example, we all know that bird wings cause flying which causes birds to build nests in trees, but perhaps
the nests also cause breaking tree branches, which cause broken windshields, which cause higher car
insurance rates, and so on. But although a neighborhood’s high insurance rates might imply a large
population of large birds they are not a feature of birds. Causal relations involving bird features also go
backwards indefinitely (birds have wings because they have bird DNA, bird DNA produces wings
because of a complicated evolutionary past, etc.).
These examples raise two questions. The first concerns the boundaries of categories’ causal
models. If a causal model includes variables directly or indirectly related to a category’s features then
(because everything is indirectly connected to everything else) all variables are included, which means
that all categories have the same causal model (the model of everything). Clearly, the causal model
approach needs to specify the principles that determine which variables are part of a category’s causal
model. The second question concerns how the evidence that a variable provides for category membership
differs depending on whether it is part of the causal model or not. My own view is that a full model of
causal-based classification will involve two steps. In the first step classifiers use all relevant variables
(those both inside and outside of the causal model), to infer the presence of unobserved features via the
causal relations that link them (Rehder, 2007; Rehder & Kim, 2009a). In the second step, the classifier
evaluates the likelihood that the (observed and inferred) features that are part of the category’s causal
model were generated by that model.
9.4. Additional Tests with Natural Categories
As mentioned, although a large number of studies have tested novel categories, others have
assessed causal-based effects in real-world categories. Using the missing feature method, these studies
have generally reported a causal status effect (Ahn, 1998; Kim & Ahn, 2002a; b; Sloman et al. 1998).
Moreover, in contrast to the studies reviewed above testing novel categories favoring the generative
model, research with natural categories has provided evidence supporting the dependency model, as
classification judgments were shown to exhibit substantial correlations with the dependency model’s
Causal-Based Classification: A Review
59
predictions derived from subjects’ category theories (measured, e.g., via the theory drawing task).
There are a number of uncertainties regarding the interpretation of these studies, some of which I
have already mentioned. The numerous confounds associated with natural categories mean that the
apparently greater importance of more causal features may be due to other factors (e.g., they may also
have higher category validity, i.e., been observed more often in category members). In addition, because
these studies used the missing feature method to assess feature weights, the causal status effect reported
by these studies could have reflected coherence effects instead if the more causal features were also those
involved in a greater number of causal relations. Finally, only the dependency model was fit to the data,
and so naturally it is unknown whether the generative model wouldn’t have achieved a better fit. One
notable property of the causal relations measured in these studies is that they correspond to those for
which the generative model also predicts a strong causal status effect. For example, Kim and Ahn (2002a,
Experiment 2) had subjects rate the strength of the causal links on a three-point scale (1–weak, 2–
moderate, 3–strong) and found that the vast majority of links had an average strength of between 1 and
1.5. As I have shown, the generative model predicts a strong causal status effect when weak causal links
produce each feature in a causal chain with decreasing probability.
Sorely needed therefore are new studies of real-world categories that take into account what has
been learned from studies testing novel materials. First, subjects must be presented with additional test
items, namely, those that are missing more than just one feature. This technique will allow coherence
effects to be assessed and will provide a more accurate measure of features weights (see Section 2.2).19
Second, both the dependency and generative models must be fit to the resulting data. Results showing that
the causal status effect increases with causal strength will favor the dependency model, ones showing it
decreases with causal strength, and the presence of multiple cause and coherence effects, will favor the
generative model. Finally, the confound between a category’s causal and empirical/statistical information
can be addressed by seeking objective measures of the later, perhaps by using corpus-based techniques
(e.g., Landauer & Dumais, 1997). Statistical methods like multiple regression can then be used to
determine whether causal theories provide any additional predictive power above and beyond features’
objective category validity.
Causal-Based Classification: A Review
60
9.5. Processing Issues
The studies reviewed above all used unspeeded judgments in which subjects were given as long
as they wanted to make a category membership decision. It is reasonable to ask whether these judgments
would be sensitive to causal knowledge if they were made the same way that categorization decisions are
made hundreds of times each day, namely, in a few seconds or less. One might speculate that use of
causal knowledge is an example of slower, deliberate, “analytical” reasoning that is unlikely to appear
under speeded conditions (Sloman, 1996; Smith & Sloman, 1994). But studies have generally found
instead that effects of theoretical knowledge on classification obtain even under speeded conditions (Lin
& Murphy, 1997; Palmeri & Blalock, 2000). For example, Luhmann et al. (2006) found that classification
judgments exhibited a causal status effect even when they were made in 500 ms.
Besides the causal status effect, no studies have examined how the other sorts of causal-based
effects respond to manipulations of response deadline. Luhmann et al. proposed that in their study
subjects “prestored” feature weights during the initial learning of the category’s features and causal
relations and then were able to quickly access those weights during the classification test. If this is
correct, then one might expect to also see a multiple-cause effect and isolated feature effect at short
deadlines. In contrast, because the coherence effect and explicit causal reasoning involve processing the
values on multiple stimulus dimensions, these effects may only emerge at longer response deadlines.
9.6. Integrating Causal and Empirical/Statistical Information
Another outstanding question concerns how people’s beliefs about a category’s causal structure is
integrated with the information gathered through first hand observation of category members. On one
hand, numerous studies have demonstrated how general semantic knowledge that relates category features
alters how categories are learned (e.g., Murphy & Allopenna, 1994; Rehder & Ross, 2001; Wattenmaker
et al., 1986). However, there are relatively few studies examining how a category’s empirical/statistical
information is integrated with specifically causal knowledge. One exception is Rehder and Hastie (2001)
who presented subjects with both causal laws and examples of category members and by so doing
orthogonally manipulated categories’ causal and empirical structure (providing either no data or data with
or without correlations that were consistent with the causal links). Although subjects’ subsequent
Causal-Based Classification: A Review
61
classification judgments reflected the features’ objective category validity, they were generally insensitive
to the correlations that inhered in the observed data. On the other hand, Waldmann et al. (1995) found that
the presence of interfeature correlations affected subjects’ interpretation of the causal links they were
taught. More research is needed to determine how and to what extent the correlational structure of
observed data is integrated into a category’s causal model. Note that because the representation of causal
relations assumed by the generative model emerged out of the causal learning literature (Cheng, 1997), it
generates clear hypotheses regarding how the strengths of causal links (the m and b parameters) should be
updated in light of observed data.
9.7. Developmental Questions
Given the small number of relevant studies (Ahn et al. 2000b; Meunier & Cordier, 2009; Hayes &
Rehder, 2009), it is not surprising that there are many outstanding questions regarding the effect of causal
knowledge on classification feature weights and interactions. Because I attributed the absence of a causal
status effect in Hayes and Rehder (2009) to 5-year olds interpreting the causal link as deterministic, an
obvious possibility to test is whether this effect would be observed in children when links are
probabilistic instead. Another question is whether the size of the coherence effect in children varies with
causal link strength in the way predicted by the generative model. Finally, it is currently unknown
whether children exhibit a multiple cause effect.
9.8. Additional Dependent Variables
Finally, whereas all studies reviewed here asked for some sort of category membership judgment,
it is important to expand the types of dependent variables that are used to assess causal-based effects. For
example, several studies have examined the effect of theoretical knowledge on category construction (the
way in which people spontaneously sort items together; Ahn & Medin, 1992; Kaplan & Murphy, 1999;
Medin, Wattenmaker, & Hampson, 1987) but only a few have examined the effect of causal knowledge.
One exception is Ahn and Kim (2000, Experiment 4) who presented subjects with match-to-sample trials
consisting of a target with one feature that caused another (X!Y) and two cases that shared with the
target either the cause (X!Z) or the effect (W!Y). Subjects spontaneously sorted together items on the
basis of shared causes rather than shared effects, that is, they exhibited a causal status effect. On the other
Causal-Based Classification: A Review
62
hand, Ahn (1999) found that sorting failed to be dominated by any feature (including the cause) for items
with four features arranged in a causal chain. But subjects did sort on the basis of the cause for common
cause structures and on the basis of the effect for common effect structures (mirroring the results with
explicit classification judgments found by Rehder & Hastie, 2001, Figure 9). Additional research is
needed to determine whether the other effects documented here (e.g., sensitivity of the causal status effect
to causal strength, coherence effects, etc.) also obtain in category construction tasks.
9.9. Closing Words
Twenty-five years have passed since Murphy and Medin (1985) observed how concepts of
categories are embedded in the rich knowledge structures that make up our conceptual systems. What has
changed in the last 10-15 years is that insights regarding how such knowledge affects learning, induction,
and classification have now been cashed out as explicit computational models. This chapter has presented
models of how interfeature causal relations affect classification and reviewed the key empirical
phenomena for and against those models. That so many important outstanding questions remain means
that this field can be expected to progress as rapidly in the next decade as it has in the past one.
Causal-Based Classification: A Review
63
References
Ahn, W. (1998). Why are different features central for natural kinds and artifacts? The role of
causal status in determining feature centrality. Cognition, 69, 135-178.
Ahn, W. (1999). Effect of causal structure on category construction. Memory & Cognition, 27,
1008-1023.
Ahn, W., & Medin, D. L. (1992). A two-stage model of category construction. Cognitive Science,
16, 81-121.
Ahn, W., Kim, N. S., Lassaline, M. E., & Dennis, M. J. (2000a). Causal status as a determinant of
feature centrality. Cognitive Psychology, 41, 361-416.
Ahn, W., Gelman, S. A., Amsterdam, A., Hohenstein, J., & Kalish, C. W. (2000b). Causal status
effect in children's categorization. Cognition, 76, B35-B43.
Ahn, W., & Kim, N. S. (2001). The causal status effect in categorization: An overview. In D. L.
Medin (Ed.), The psychology of learning and motivation (Vol. 40, pp. 23-65). San Diego, CA: Academic
Press.
Ahn, W., Marsh, J. K., Luhmann, C. C., & Lee, K. (2002). Effect of theory based correlations on
typicality judgments. Memory & Cognition, 30, 107-118.
Ahn, W., Levin, S., & Marsh, J. K. (2005). Determinants of feature centrality in clinicians'
concepts of mental disorders, Proceedings of the 25th Annual Conference of the Cognitive Science
Society. Mahwah, New Jersey: Lawrence Erlbaum Associates.
Ahn, W., Flanagan, E., Marsh, J. K., & Sanislow, C. (2006). Beliefs about essences and the
reality of mental disorders. Psychological Science, 17, 759-766.
Ashby, F. G., & Maddox, W. T. (2005). Human category learning. Annual Review of Psychology,
56, 149-178.
Bloom, P. (1998). Theories of artifact categorization. Cognition, 66, 87-93.
Bonacich, P., & Lloyd, P. (2001). Eigenvector-like mevasures of centrality for asymmetric
relations. Social Networks, 23, 191-201.
Booth, A. (2007). The cause of infant categorization. Cognition, 106, 984-993.
Causal-Based Classification: A Review
64
Braisby, N., Franks, B., & Hampton, J. (1996). Essentialism, word use, and concepts. Cognition,
59, 247-274.
Buehner, M. J., Cheng, P. W., & Clifford, D. (2003). From covariation to causation: A test of the
assumption of causal power. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29,
1119-1140.
Chaigneau, S. E., Barsalou, L. W., & Sloman, S. A. (2004). Assessing the causal structure of
function. Journal of Experimental Psychology. General, 133, 601-625.
Cheng, P. (1997). From covariation to causation: A causal power theory. Psychological Review,
104, 367-405.
Cheng, P. W., & Novick, L. R. (2005). Constraints and nonconstraints in causal learning: Reply
to White (2005) and to Luhmann and Ahn (2005). Psychology Review, 112, 694- 707.
Fischoff, B., Slovic, P., & Lichtenstein, S. (1978). Fault trees: Sensitivity of estimated failure
probabilities to problem representation. Journal of Experimental Psychology: Human Perception and
Performance, 4, 330-344.
Gelman, S. A. (2003). The essential child: The origins of essentialism in everyday thought. New
York: Oxford University Press.
Gelman, S. A., & Wellman, H. M. (1991). Insides and essences: Early understandings of the
nonobvious. Cognition, 38, 213-244.
Gentner, D. (1989). The mechanisms of analogical learning. In S. Vosniadou & A. Ortony (Eds.),
Similarity and analogical reasoning (pp. 199-241). New York: Cambridge University Press.
Gopnik, A., Glymour, C., Sobel, D. M., Schulz, L. E., & Kushnir, T. (2004). A theory of causal
learning in children: Causal maps and Bayes nets. Psychological Review, 111, 3-23.
Gopnik, A., & Sobel, D. M. (2000). Detecting blickets: How young children use information
about novel causal powers in categorization and induction. Child Development, 71, 1205-1222.
Griffiths, T. L., & Tenenbaum, J. B. (2005). Structure and strength in causal induction. Cognitive
Psychology, 51, 334-384.
Hampton, J. A. (1995). Testing the prototype theory of concepts. Journal of Memory and
Causal-Based Classification: A Review
65
Language, 34, 686-708.
Hampton, J. A., Estes, Z., & Simmons, S. (2007). Metamorphosis: Essence, appearance, and
behavior in the categorization of natural kinds. Memory & Cognition, 35, 1785-1800.
Harris, H.D., & Rehder, B. (2006). Modeling category learning with exemplars and prior
knowledge. In R. Sun & N. Miyake (Eds.), Proceedings of the 28th Annual Conference of the Cognitive
Science Society (pp. 1440-1445). Mahwah, NJ: Erlbaum.
Hayes, B.K., & Rehder, B. (2009). Children’s causal categorization. In preparation.
Johnson, S. C., & Solomon, G. E. A. (1997). Why dogs have puppied and cates have kittens: The
role of birth in young children's understanding of biological origins. Child Development, 68, 404-419.
Judd, C. M., McClelland, G. H., & Culhane, S. E. (1995). Data analysis: Continuing issues in the
everyday analysis of psychological data. Annual Review of Psychology, 46, 433-465.
Kalish, C. W. (1995). Essentialism and graded category membership in animal and artifact
categories. Memory & Cognition, 23, 335-349.
Kaplan, A. S., & Murphy, G. L. (1999). The acquisition of category structure in unsupervised
learning. Memory & Cognition, 27, 699-712.
Keil, F. C. (1989). Concepts, kinds, and cognitive development. Cambridge, MA: MIT Press.
Keil, F. C. (1995). The growth of causal understandings of natural kinds. In D. Sperber, D.
Premack & A. J. Premack (Eds.), Causal cognition: A multidisciplinary approach (pp. 234-262). Oxford:
Clarendon Press.
Kim, N. S., & Ahn, W. (2002a). Clinical psychologists' theory-based representation of mental
disorders affect their diagnostic reasoning and memory. Journal of Experimental Psychology: General,
131, 451-476.
Kim, N. S., & Ahn, W. (2002b). The influence of naive causal theories on lay concepts of mental
illness. American Journal of Psychology, 115, 33-65.
Kim, N. S., Luhmann, C. C., Pierce, M. L., & Ryan, M. M. (2008). Causal cycles in
categorization. Memory & Cognition, 37, 744-758.
Causal-Based Classification: A Review
66
Kushnir, T., & Gopnik, A. (2005). Young children infer causal strength from probabilities and
interventions. Psychological Science, 16, 678-683.
Lamberts, K. (1995). Categorization under time pressure. Journal of Experimental Psychology:
General, 124, 161-180.
Lamberts, K. (1998). The time course of categorization. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 24, 695-711.
Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato's problem: The latent semantic
analysis theory of knowledge acquisition, induction, and representation. Psychological Review, 104, 211240.
Lin, E. L., & Murphy, G. L. (1997). The effects of background knowledge on object
categorization and part detection. Journal of Experimental Psychology: Human Perception and
Performance, 23, 1153-1163.
Lober, K., & Shanks, D. R. (2000). Is causal induction based on causal power? Critique of Cheng
(1997). Psychological Review, 107, 195-212.
Lombrozo, T. (2007). Simplicity and probability in causal explanation. Cognition, 55, 232-257.
Lombrozo, T. (2009). Explanation and categorization: How "why?" informs "what?". Cognition,
110, 248-253.
Lu, H., Yuille, A. L., Liljeholm, M., Cheng, P. W., & Holyoak, K. J. (2008). Bayesian generic
priors for causal learning. Psychological Review, 115, 955-984.
Luhmann, C. C., Ahn, W., & Palmeri, T. J. (2006). Theory-based categorization under speeded
conditions. Memory & Cognition, 34(5), 1102-1111.
Malt, B. C., & Johnson, E. C. (1992). Do artifacts have cores? Journal of Memory and Language,
31, 195-217.
Malt, B. C. (1994). Water is not H2 O. Cognitive Psychology, 27, 41-70.
Marsh, J., & Ahn, W. (2006). The role of causal status versus inter-feature links in feature
weighting. In R. Sun & N. Miyake (Eds.), Proceedings of the 28th Annual Conference of the Cognitive
Science Society (pp. 561-566). Mahwah, NJ: Erlbaum.
Causal-Based Classification: A Review
67
Matan, A., & Carey, S. (2001). Developmental changes within the core of artifact concepts.
Cognition, 78, 1-26.
Medin, D. L., Wattenmaker, W. D., & Hampson, S. E. (1987). Family resemblance, conceptual
cohesiveness, and category construction. Cognitive Psychology, 19, 242-279.
Medin, D. L., & Ortony, A. (1989). Psychological essentialism. In S. Vosniadou & A. Ortony
(Eds.), Similarity and analogical reasoning (pp. 179-196). Cambridge, MA: Cambridge University Press.
Medin, D. L., & Schaffer, M. M. (1978). Context theory of classification learning. Psychological
Review, 85, 207-238.
Meunier, B., & Cordier, F. (2009). The biological categorizations made by 4 and 5-year olds: The
role of feature type versus their causal status. Cognitive Development, 24, 34-48.
Minda, J. P., & Smith, J. D. (2002). Comparing prototype-based and exemplar-based accounts of
category learning and attetional allocation. Journal of Experimental Psychology: Learning, Memory, and
Cognition, 28, 275-292.
Morris, M. W., & Larrick, R. P. (1995). When one cause casts doubt on another: A normative
analysis of discounting in causal attribution. Psychological Review, 102, 331-355.
Murphy, G. L., & Allopenna, P. D. (1994). The locus of knowledge effects in concept learning.
Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 904-919.
Murphy, G. L., & Medin, D. L. (1985). The role of theories in conceptual coherence.
Psychological Review, 92, 289-316.
Murphy, G. L., & Wisniewski, E. J. (1989). Feature correlations in conceptual representations. In
G. Tiberchien (Ed.), Advances in cognitive science: Vol. 2. Theory and applications (pp. 23-45).
Chichester, England: Ellis Horwood.
Murphy, G. L. (2002). The big book of concepts: MIT Press.
Oppenheimer, D. M., Tenenbaum, J. B., & Krynski, T. (2009). Categorization as causal
explanation: Discounting and augmenting of concept-irrelevant features in categorization. Submitted for
publication.
Causal-Based Classification: A Review
68
Palmeri, T. J., & Blalock, C. (2000). The role of background knowledge in speeded perceptual
categorization. Cognition, 77, B45-B47.
Patalano, A. L., & Ross, B. H. (2007). The role of category coherence in experience-based
prediction. Psychonomic Bulletin & Review, 14, 629-634.
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference.
San Mateo, CA: Morgan Kaufman.
Poulton, E. C. (1989). Bias in quantifying judgments. Hillsdale, NJ: Erlbaum.
Rehder, B. (2003a). Categorization as causal reasoning. Cognitive Science, 27, 709-748.
Rehder, B. (2003b). A causal-model theory of conceptual representation and categorization.
Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 1141-1159.
Rehder, B. (2007). Essentialism as a generative theory of classification. In A. Gopnik, & L.
Schultz, (Eds.), Causal learning: Psychology, philosophy, and computation (pp. 190-207). Oxford, UK:
Oxford University Press.
Rehder, B. (2009a). Causal-based property generalization. Cognitive Science, 33, 301-343.
Rehder, B. (2009b). The when and why of the causal status effect. Submitted for publication.
Rehder, B. & Hastie, R. (2001). Causal knowledge and categories: The effects of causal beliefs
on categorization, induction, and similarity. Journal of Experimental Psychology: General, 130, 323-360.
Rehder, B., & Hastie, R. (2004). Category coherence and category-based property induction.
Cognition, 91, 113-153.
Rehder, B., & Murphy, G. L. (2003). A Knowledge-Resonance (KRES) model of category
learning. Psychonomic Bulletin & Review, 10, 759-784.
Rehder, B., & Burnett, R. C. (2005). Feature inference and the causal structure of object
categories. Cognitive Psychology, 50, 264-314.
Rehder, B. & Kim, S. (2006). How causal knowledge affects classification: A generative theory
of categorization. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32, 659-683.
Rehder, B. & Kim, S. (2009a). Classification as diagnostic reasoning. Memory & Cognition, 37,
715-729.
Causal-Based Classification: A Review
69
Rehder, B. & Kim, S. (2009b). Causal status and coherence in causal-based categorization.
Journal of Experimental Psychology: Learning, Memory, and Cognition. Accepted pending minor
revisions.
Rehder, B. & Ross, B.H. (2001). Abstract coherent concepts. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 27, 1261-1275.
Rehder, B., & Milovanovic, G. (2007). Bias toward sufficiency and completeness in causal
explanations. In D. MacNamara & G. Trafton (Eds.), Proceedings of the 29th Annual Conference of the
Cognitive Science Society (p. 1843).
Reichenbach, H. (1956). The direction of time. Berkeley: University of California Press.
Rips, L. J. (1989). Similarity, typicality, and categorization. In S. Vosniadou & A. Ortony (Eds.),
Similarity and analogical reasoning (pp. 21-59). New York: Cambridge University Press.
Rips, L. J. (2001). Necessity and natural categories. Psychological Bulletin, 127, 827-852.
Rosch, E. H., & Mervis, C. B. (1975). Family resemblance: Studies in the internal structure of
categories. Cognitive Psychology, 7, 573-605.
Salmon, W. C. (1984). Scientific explanation and the causal structure of the world. Princeton,
NJ: Princeton University Press.
Sloman, S. A. (2005). Causal models: How people think about the world and its alternatives.
Oxford, UK: Oxford University Press.
Sloman, S. A. (1996). The empirical case for two systems of reasoning. Psychological Bulletin,
119, 3-23.
Sloman, S. A., & Lagnado, D. A. (2005). Do we "do"? Cognitive Science, 29, 5-39.
Sloman, S. A., Love, B. C., & Ahn, W. (1998). Feature centrality and conceptual coherence.
Cognitive Science, 22, 189-228.
Smith, E. E., & Sloman, S. A. (1994). Similarity- versus rule-based categorization. Memory &
Cognition, 22, 377-386.
Sobel, D. M., & Kirkham, N. Z. (2006). Blickets and babies: The development of causal
reasoning in toddlers and infants. Developmental Psychology, 42, 1103-1115.
Causal-Based Classification: A Review
70
Sobel, D. M., Tenenbaum, J. B., & Gopnik, A. (2004). Children's causal inferences from indirect
evidence: Backwards blocking and Bayesian reasoning in preschoolers. Cognitive science, 28, 303 -333.
Sobel, D. M., Yoachim, C. M., Gopnik, A., Meltzoff, A. N., & Blumenthal, E. J. (2007). The
blicket within Preschoolers' inferences about insides and causes. Journal of Cognition and Development,
8, 159-182.
Tversky, A., & Gati, I. (1978). Studies in similarity. In E. Rosch & B. B. Lloyd (Eds.), Cognition
and categorization (pp. 79-98). Hillsdale, NJ: Erlbaum.
Tversky, A., & Kahneman, D. (1980). Causal schemas in judgments under uncertainty. In M.
Fishbein (Ed.), Progress in social psychology. Hillsdale, NJ: Erlbaum.
Tversky, A., & Gati, I. (1982). Similarity, separability, and the triangular inequality.
Psychological Review, 89, 123-154.
Tversky, A., & Koehler, D. J. (1994). Support theory: A nonextensional representation of
subjective probability. Psychological Review, 101, 547-567.
Waldmann, M. R., & Holyoak, K. J. (1992). Predictive and diagnostic learning within causal
models: Asymmetries in cue competition. Journal of Experimental Psychology: General, 121, 222-236.
Waldmann, M. R., Holyoak, K. J., & Fratianne, A. (1995). Causal models and the acquisition of
category structure. Journal of Experimental Psychology: General, 124, 181-206.
Waldmann, M. R., & Hagmayer, Y. (2005). Seeing versus doing: Two modes of accessing causal
knowledge. Journal of Experimental Psychology: Learning, Memory, & Cognition, 31, 216-227.
Wattenmaker, W. D., Dewey, G. I., Murphy, T. D., & Medin, D. L. (1986). Linear separability
and concept learning: Context, relational properties, and concept naturalness. Cognitive Psychology, 18,
158-194.
Wisniewski, E. J. (1995). Prior knowledge and functionally relevant features in concept learning.
Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 449-468.
Causal-Based Classification: A Review
71
Author Note
Bob Rehder, Department of Psychology, New York University.
Correspondence concerning this chapter should be addressed to Bob Rehder, Department of
Psychology, 6 Washington Place, New York, NY 10003 (email: bob.rehder@nyu.edu).
Causal-Based Classification: A Review
72
Footnotes
1
Moreover, there is evidence suggesting that causal knowledge and within-category empirical-
statistical information are conflated in people’s mental representation of natural categories. For example,
Sloman et al. (1998) conducted a factor analysis showing that category features vary along three
dimensions. The first two were identified as perceptual salience (assessed with questions like "How
prominent in your conception of apples is that it grows on trees?") and diagnosticity (or cue validity,
assessed with questions like "Of all things that grow on trees, what percentage are apples?"). Measures
loading on a third factor included both category validity (i.e., “What percentage of apples grow on
trees?”) and those related to a construct they labeled conceptual centrality or mutability (assessed with
questions like “How good an example of an apple would you consider an apple that does not ever grow
on trees?”). Category validity and centrality were also highly correlated in a study testing a novel
category that was designed to dissociate the two measures (Sloman et al, Study 5). Conceptual centrality
corresponds to one of the questions addressed this chapter, namely, the evidence that an individual feature
provides for a particular category. Thus, it may be difficult to separate the effects of causal knowledge
and observed category members on classification into natural categories.
2
Indeed, when the regression predictor for a feature is coded as +1 when the feature is present
and –1 when it is absent and all other predictors are orthogonal (which occurs when all possible test items
are presented), the resulting regression weight is exactly half this difference. A concrete example of a
regression equation follows.
3
For example, it is reasonable to ask whether the assumption of linearity that is part of linear
regression is appropriate for a classification rating task given research suggesting that the evidence that
features provide for category membership combines multiplicatively rather than additively (Minda and
Smith, 2002). Thus, either a logarithmic transformation of the classification ratings or an alternative
method of analysis that assumes a multiplicative rule might be appropriate. For example, Rehder (2003b)
analyzed classification rating data (of all possible test items that could be formed on four binary
dimensions) by normalizing the ratings so they summed to 1 and then treating the results as representing a
Causal-Based Classification: A Review
73
probability distribution. From this distribution, I derived the “probability” of each feature and the
“probabilistic contrast” for each pair of features, measures that are analogous to the feature weights and
two-way interactions derived from linear regression. Because it is based on probabilities, this method
implicitly incorporates the assumption that evidence combines multiplicatively and thus may yield what a
more accurate measure of “feature weights.” On the other hand, it requires the strong assumption that
ratings map one-to-one onto probability estimates, an assumption that may have its own problems. In
practice, this probabilistic method of analysis and linear regression have always yielded the same
qualitative conclusions.
4
For example, one of the variants computes what is known as alpha centralities (Bonacich &
Lloyd, 2001). When the ds = 2, alpha centralities for the chain network are 3, 2, and 1 for features
X, Y, and Z, respectively, whereas they are 4.75, 2.50, and 1 when the ds = 3.
5
Because this study used the missing feature method, there is uncertainty regarding whether
feature Y was more important than Z, because the lower rating of missing-Y item could reflect an effect
of coherence instead (i.e., it violates two causal relations whereas the missing-Z item violates one; see
Table 1’s Example 3, discussed in Section 2.2). Nevertheless, concluding that feature X is more important
than Z on the basis of the difference in ratings between the missing-X and missing-Z items is sound,
because those items are equated on the number of violated correlations (one each).
6
Indeed, as part of another experiment testing Rehder and Kim's materials we asked participants
to judge how often a cause produced its effect. The average response was 91% (the modal response was
100%), supporting the conjecture that many subjects interpreted the causal links as nearly deterministic.
7
There may be some uncertainty regarding the dependency model’s predictions for this
experiment that stems from ambiguities regarding how its construct of “causal strength” should be
interpreted. We interpret it to be a measure of the propensity of the cause to produce the effect, that is, as
a causal power (corresponding to the generative model’s m parameter). An alternative interpretation is
that it corresponds to a measure of covariation between the cause and effect (e.g., the familiar !P rule of
causal induction). Under this alternative interpretation, the dependency model would also predict a
Causal-Based Classification: A Review
74
weaker causal status effect in the Background-50 condition (because the causal links themselves are
weaker). Although Sloman et al. did not specify which interpretation was intended, we take the work of
Cheng and colleagues as showing that when you ask people to judge “causal strength” they generally
respond with an estimate of causal power rather than !P (Cheng, 1997; Buehner, Cheng, & Clifford,
2003) and so that is the assumption we make here. Of course, exactly what measure people induce in
causal learning tasks is itself controversial (e.g., Lober & Shanks, 2000) and even Buehner et al. found
that substantial minority of subjects responded with causal strength estimates that mirrored !P. But even
if one grants this alternative interpretation, it means that the dependency model predicts a weaker causal
status effect in the Background-50 condition whereas the generative model predicts it should be absent
entirely. In addition of course, the generative model but not the dependency model predicts effects of this
experiment’s manipulation on feature frequency ratings and coherence effects (as described below).
8
Although explicitly defining essential features in this manner controls the knowledge brought to
bear during classification, note that these experimentally-defined “essences” may differ in various ways
from (people's beliefs about) some real category essences. Although adults' beliefs about essences are
sometimes concrete (e.g., DNA in the case of biological kinds for adults), preschool children's knowledge
about animals' essential properties is less specific, involving only a commitment to biological
mechanisms that operate on their "insides" (Gelman & Wellman, 1991; Gelman, 2003; Johnson &
Solomon, 1997). And, an essential property not just one that just happens to be present in all category
members (and absent in all nonmembers), it is one that is present in all category members that could exist.
But while the concreteness and noncontingency of people's essentialist beliefs is undoubtedly important
under some circumstances, we suggest that a feature that is present in all category members is sufficient
to induce a causal status effect under the conditions tested in this experiment.
9
The absence of a significant causal status effect in the Unconnected-Chain-80 condition was
somewhat of a surprise given the results from Rehder and Kim’s (2009b) Experiment 1 reviewed in
Section 4.1. The Unconnected-Chain-80 condition was identical to that experiment’s Chain-75 condition
except for (a) causal strengths were 80% instead of 75% and (b) the presence of an explicit essence, albeit
Causal-Based Classification: A Review
75
one that is not causally related to the other features. It is conceivable that the 5% increase in causal
strengths may be responsible for reducing the causal status effect; indeed the generative model predicts a
slightly smaller causal status effect for m = .80 versus .75. In addition, the presence of an explicit
essential feature to which the causal chain was not connected may have led participants to assume that the
chain was unlikely to be related to any other essential property of the category (and of course the
generative model claims that essential properties to which the causal chain is causally connected promotes
a causal status effect).
10
Some studies have claimed to show just such a dissociation between feature importance and
category validity, however. For example, in Ahn et al. (2000, Experiment 2) participants first observed
exemplars with three features that appeared with equal frequency and then rated the frequency of each
feature. They then learned causal relations forming a causal chain and rated the goodness of missing-X,
missing-Y, and missing-Z test items. Whereas features' likelihood ratings did not differ, the missing-X
item was rated lower than the missing-Y item which was lower than the missing-Z item, a result the
authors interpreted as demonstrating a dissociation between category validity and categorization
importance. This conclusion is unwarranted, however, because the frequency ratings were gathered before
the presentation of the causal relations. Clearly, one can only assess whether perceived category validity
mediates the relationship between causal knowledge and features' categorization importance by assessing
category validity after the causal knowledge has been taught. Sloman et al. (1998, Study 5) and Rehder
and Kim (2009b) gathered likelihood ratings after the causal relationships were learned and found no
dissociation with feature weights.
11
For the following 15 studies, “[a/b/c]” represents the number of conditions in which a full (a),
partial (b), or zero or negative (c) causal status effect obtained: Sloman et al. (1998) [2/0/0], Ahn (1998)
[2/0/0]; Ahn et al (2000) [2/0/0], Rehder and Hastie (2001) [3/0/3], Kim and Ahn (2002b) [1/0/0], Rehder
(2003a) [1/0/1], Rehder (2003b) [0/2/1], Rehder and Kim (2006) [0/2/4], Luhmann et al. (2006) [7/2/0],
Marsh and Ahn (2006) [2/0/4], Rehder and Kim (2008) [0/1/1], Rehder and Kim (2009b) [5/0/3],
Lombrozo (2009) [1/0/1], Rehder (2009) [1/0/3], and Hayes and Rehder (2009) [0/0/2]. Note that because
Causal-Based Classification: A Review
76
Sloman et al. and Luhmann et al. either did not gather or report ratings for missing-Y test items in some
conditions, the causal status effect is counted as “full” in those cases. Also note that the results from
Luhmann et al.’s 300ms deadline condition of their Experiment 2B are excluded
12
Although Ahn and Kim (2001) did not themselves report the results of regression analyses, a
regression analysis of the average classification results in their Table I yields weights of 0.47 on the
causes and 1.12 on the common effect, confirming the greater importance of the common effect in their
study.
13
Note that there have been some studies that have failed to find a multiple cause effect with a
common effect network. For example, using virtually the same materials as Rehder and Hastie (2001) and
Rehder and Kim (2006) except for the use of the “normal” wording for atypical feature values (see
Section 6.4), Marsh and Ahn (2006) failed to find an elevated weight on the common effect. Additional
research will be required to determine whether this is a robust finding or one that depends on
idiosyncratic details of Marsh and Ahn’s procedure.
14
It is important to note that these predictions depend on the particular model parameters chosen.
First, just as for a chain network, the generative model only predicts a causal status effect for a common
cause network when causal links are probabilistic. Second, regarding the common effect network, the
claim is that the probability of an effect will increase with its number of causes. Whether it will be more
probable than the causes themselves (as it is in the example in Table 5) also depends on the strength of the
causal links. Third, whether the probability of a common effect in fact increases will interact with the
subject’s existing statistical beliefs about the category. For example, if one is quite certain about the
category validity of the effect (e.g., because it is based on many observations), then the introduction of
additional causes might be accommodated by lowering one’s estimates of the strengths of the causal links
(the m and b parameters) instead. See Rehder and Milovanovic (2007) for evidence of this kind of mutual
influence between data and theory. Studies that systematical manipulated the strength of causal links in
common cause and common effect networks (like Rehder & Kim, 2009b, did with a chain network) have
yet to be conducted.
Causal-Based Classification: A Review
15
77
They also provided indirect evidence for this claim. They presented subjects with items in
which the presence of the common effect feature was unknown and asked them to rate the likelihood that
it was present. They found that inference ratings increased as a function of the number of causes present
in the item. This result is consistent with the view that people can use causal knowledge to infer category
features (Rehder & Burnett, 205). It also implies that a feature will have greater category validity when it
has multiple causes. Note that this result is analogous to the findings in Section III in which a change in
features weights (a causal status effect) was always accompanied by a change in features’ category
validity (likelihood of appearing in category members).
16
Although the interaction between condition and interaction term did not reach significance (p >
.15), a separate analysis of the Chain-75 conditions revealed a significant effect of direct versus indirect
interaction terms in the Chain-75 condition, p < .01 but not the Chain-100 condition, p = .17.
17
Nevertheless, Chaigneau et al. found that classification ratings for vignettes with inappropriate
historical intentions were lower relative to a baseline condition in which all four causes were present. The
authors argue that this is a case of causal updating in which information about intentions influenced how
subjects represented the artifact’s physical structure (even when information about physical structure was
provided as part of the vignette). For example, if the designer intended to create a mop, the subject might
be more sure that the object had a physical structure appropriate to mopping.
18
It should be acknowledged that whereas Hayes and Rehder taught their subjects a single link
between two features, Ahn et al. and Meunier and Cordier taught theirs a common cause structure in
which one feature caused two others, and perhaps a cause with two effects is sufficient to induce a causal
status effect in children. Of course, arguing against this possibility are findings above indicating that, at
least for adults, a feature’s importance does not generally increase with its number of dependents (see
Section 5.1).
19
One challenge to applying the regression method to natural categories concerns the number of
features involved. Subjects usually know dozens of features of natural categories, implying the need for
2n test items (assuming n binary features) to run a complete regression that assesses main effects (i.e.,
Causal-Based Classification: A Review
78
feature weights), two-way interactions, and all higher-order interactions. One compromise is to present
only those test items missing either one or two features, allowing an assessment of feature weights and
the two-way interactions. Furthermore, the missing-two-feature items could be restricted to those missing
two features that are identified (e.g., on a theory drawing task) to be causally related.
79
Table 1
Hypothetical classification ratings for four example categories with features X, Y, and Z. Examples 1 and
2 assume features are related in a common cause structure; Examples 3 and 4 assumes they form a causal
chain. Examples 1 and 3 assume all features are have equal classification weights; Examples 2 and 4
assume X > Y = Z. 1 = feature present; 0 = feature absent; x = feature state unknown.
Hypothe tical Classificati on Ratings
Commo n Cause
Ch ain
Y'X !Z
X!Y !Z
Example 1
Example 2
Example 3
Example 4
Weight (X)
1
1
1
1
Weight (Y)
1
.5
1
.5
Weight (Z)
1
.5
1
.5
Weight on interactions
1
1
1
1
111
10
9
10
9
011 (missing only X)
4
3
6
5
101 (missing only Y)
6
6
4
4
110 (missing only Z)
6
6
6
6
100 (missing all but X)
2
3
4
5
010 (missing all but Y)
4
4
2
2
001 (missing all but Z)
4
4
4
4
000
4
5
4
5
1xx
6
6
6
6
x1x
6
5.5
6
5.5
xx1
6
5.5
6
5.5
Parameters
Test items
80
Table 2
Feature “centralities” predicted by the dependency model after two iterations for features X, Y, and Z in
the chain network of Figure 1 for different values of the causal strength parameters dXY and dYZ.
81
Table 3
Predictions for the generative model for the chain network in Figure 1 for different parameter values. cX = the probability that feature X appears in
category members; mij = strength of the causal relation between i and j; bj = strength of the j’s background causes. Direct = contrasts between
features that are directly causally related (X and Y, and Y and Z); indirect = contrasts between features that are indirectly related (X and Z).
82
Table 4
Predictions for the generative model for the causal networks in Figure 4 for different parameter values. The likelihood equations for the
essentialized model (Figure 4A) assume cE = 1; equations for the unconnected model (Figure 4B) were presented earlier in Table 3. ci = the
probability that feature i appears in category members; mij = strength of the causal relation between i and j; bj = strength of the j’s background
causes.
83
Table 5
Predictions for the generative model for three-feature common cause and common effect networks. ci = the probability that root feature i appears
in category members; mij = strength of the causal relation between i and j; bj = strength of the j’s background causes. Direct = contrasts between
features that are directly causally related; indirect = contrasts between features that are indirectly related.
84
Table 6
Test pairs presented by Hayes and Rehder (2009). For each test pair Ti subjects choose whether item X or
Y is a better category member. Choice probabilities presented in the final two columns are tested against
.50 († p < .10. * p < .05. ** p < .01.). 1 = feature present; 0 = feature absent; x = feature state unknown.
Dimension 1 = cause feature; dimension 2 = effect feature; dimensions 3 and 4 = neutral features.
Empirical Responses
(Prefere nce for X)
Test
Pair
TA
Choice Choice
X
Y
11xx 00xx
Adults
.99**
5-6 Year
Olds
.79**
TB
xx11
xx00
1.0**
.73**
TC
10xx
01xx
.52
.51
TD
10xx
xx10
.30**
.48
TE
01xx
xx01
.32**
.43
TF
11xx
xx11
.70**
.67**
TG
00xx
xx00
.62*
.55
85
Table 7
Average parameter estimates for adults and children in Hayes and Rehder (2009). wc = weight given to
cause feature; we = weight given to effect feature; wn = weight given to neutral features; wh = weight
given to agreement between cause and effect feature (coherence). Standard errors are presented in
parentheses. Causal status, isolated feature, and coherence effects are tested against 0. († p < .10. * p <
.05. ** p < .01.)
Adults
5-6 Year Olds
wc
.54 (.07)
0.32 (.05)
we
.58 (.06)
0.28 (.06)
wn
.47 (.04)
0.19 (.04)
wh
.65 (.09)
0.22 (.08)
Causal status [wc – we]
–.03 (.08)
.04 (.07)
Isolated [average(wc, we) – wn]
.09** (.07)
.11* (.05)
Coherence [wh]
.65** (.09)
.22** (.08)
Parameter
Effect
86
Figure Captions
Figure 1. A three-element causal chain.
Figure 2. Classification test results of three experiments from Rehder and Kim (2009b). (A) Experiment
1. (B) Experiment 2. (C) Experiment 3.plin is the significance of the linear trend in each condition.
Figure 3. Subjects’ feature likelihood estimates (i.e., out of 100, how many category members have that
feature) from Rehder and Kim (2009b). (A) Experiment 1. (B) Experiment 2. (C) Experiment 3. plin is the
significance of the linear trend in each condition
Figure 4. Causal structures tested in Rehder and Kim (2009b), Experiment 3. (A) Essentialized-Chain-80
condition. (B) Unconnected-Chain-80 condition.
Figure 5. Causal networks tested in Rehder and Kim (2006), Experiment 3. (A) 1-1-3 condition. (B) 1-1-1
condition.
Figure 6. Classification results from Rehder and Kim (2006), Experiment 3. Weight on the “effect” Zs is
averaged all three Z features in the 1-1-3 condition and is the single Z feature that plays the role of an
effect in the 1-1-1 condition. Weight on the “isolated” Zs is the average of the two causally unrelated Z
features in the 1-1-1 condition (see Figure 5).
Figure 7. The size of the causal status effect (measured as the ratings difference between the missing-Z
and missing-X test items) in Rehder (2009b) as a function of experimental condition.
Figure 8. Causal networks tested in Rehder and Hastie (2001). (A) Common cause network. (B) Common
effect network.
Figure 9. Classification results from Rehder and Hastie (2001). (A) Common cause condition. (B)
Common effect condition.
Figure 10. Causal networks tested in Rehder and Kim (2006), Experiment 2. (A) 3-1-1 condition. (B) 1-11 condition.
Figure 11. Classification results from Rehder and Kim (2006), Experiment 2. Weight on the “cause” Xs is
averaged all three X features in the 3-1-1 condition and is the single X feature that plays the role of a
cause in the 1-1-1 condition. Weight on the “isolated” Xs is the average of the two causally unrelated X
features in the 1-1-1 condition (see Figure 10).
Figure 12. Classification ratings from Rehder and Kim (2009b) for test items collapsed according to their
number of typical features. (A) Experiment 1. (B) Experment 2.
Figure 13. Classification ratings from Rehder (2003a). (A) Feature weights. (B) Interaction weights. (C)
Log classification ratings. Unlike previous figures depicting interaction weights, panel B presents the
average regression weights on the three-way interactions involving (a) X and two of its effects in the
common cause condition and (b) Y and two of its causes in the common effect condition.
Figure 14. Classification ratings from Rehder and Kim (2008).
Figure 15. Causal category structures from Rehder and Kim (2009a). (A) Experiment 1. (B) Experiment
87
3. (C) Experiment 4. (D) Experiment 5.
Figure 16. Causal category structures. (A) Follow up to Rehder & Kim (2009a). (B) From Chaigneau,
Barsalou and Sloman (2004).
88
89
90
91
92
93
12
10
Weight
8
1-1-1
6
1-1-3
4
2
0
X
Y
Zs
(Effects)
Feature
Zs
(Isolated)
94
95
96
97
98
10
3-1-1
Weight
8
6
1-1-1
4
2
Xs
(Isolated)
Xs
(Causes)
Y
Feature
Z
99
100
101
102
103
Download