Running head: Causal-Based Categorization: A Review Causal-Based Categorization: A Review Bob Rehder Department of Psychology New York University Send all correspondence to: Bob Rehder Department of Psychology 6 Washington Place New York, NY 10003 Phone: (212) 992-9586 Email: bob.rehder@nyu.edu Causal-Based Classification: A Review 2 Abstract This chapter reviews the last decade’s work on causal-based classification, the effect of interfeature causal relations on how objects are categorized. Evidence for and against the numerous effects discussed in the literature is evaluated: the causal status effect, the relational centrality effect, the multiple cause effect, and the coherence effect. Evidence for explicit causal reasoning in classification and the work conducted on children’s causal-based classification is also presented. The chapter evaluates the implications these findings have for two models of causal-based classification—the dependency model (Sloman, Love, & Ahn, 1998) and the generative model (Rehder, 2003b; Rehder & Kim, 2006)—and discusses methodological issues such as the testing of natural versus novel (artificial) categories and the interpretation of classification tests. Directions for future research are identified. Causal-Based Classification: A Review 3 1. Introduction Since the beginning of investigations into the mind/brain, philosophers and psychologists have asked how people learn categories and classify objects. Interest in this question should be unsurprising given that categories are a central means by which old experiences guide our responses to new ones. Regardless of whether it is a new event or temporally extended object, a political development or a new social group, a new biological species or type of widget on a computer screen, every stimulus we experience is novel in at least some regards, and so the types into which they are grouped become the repositories of new knowledge. Thus I learn that credit default swaps are risky, that elections have consequences, and that the funny symbol on my cell phone means I have voice mail. That the stimuli we classify span such a wide range means that the act of categorization is surprisingly complex and varied. Some categories seem to have a (relatively) simple structure. As a child I learn to identify some parts of my toys as wheels and some letters in words as “t.” Accordingly, much of the field has devoted itself to the categorization of stimuli with a small number of perceptual dimensions, testing in the lab subjects’ ability to learn to classify stimuli such as Gabor patches or rectangles that vary in height and width (see Ashby & Maddox, 2005, for a review). This research has shown that the learning of even these supposedly simple categories can be quite involved, as subtle differences in learning procedures and materials result in large differences in what sorts of representations are formed, how attention is allocated to different stimulus dimension, how feedback is processed, and which brain regions underlie learning. Other categories, in contrast, have an internal structure that is much more integrated with other sorts of knowledge. For example, as compared to wheels or “t”s, a notion such as elections is related to many other concepts: that people organize themselves into large groups known as countries, that countries are led by governments, that in democratic countries governments are chosen by people voting, and so on. Ever since this point was made in Murphy and Medin (1985) seminal article, a substantial literature has emerged documenting how the knowledge structures in which categories are embedded have a large effect on how categories are learned, how objects are classified, how new properties are generalized to a category, and how missing features in an object are predicted on the basis of its category Causal-Based Classification: A Review 4 membership (see Murphy, 2002, for a review). Because of its ubiquity in our conceptual structures (Ahn, Marsh, Luhmann, & Lee, 202), one particular type of knowledge—the causal relations that obtain between features of categories—has received special attention. For example, even if you know little about cars, you probably have at least some vague notion that cars not only have gasoline, spark plugs, radiators, fans, and emit carbon monoxide but also that these features causally interact—that the spark plugs are somehow involved in the burning of the gasoline, that that burning produces carbon monoxide and heat, and that the radiator and fan somehow work to dissipate the later. Here I focus on the rich database of empirical results showing how this sort of knowledge affects the key category-based judgment, namely, classification itself. This chapter has the following structure. The first section addresses important methodological issues regarding the measurement of various effects of causal knowledge on categorization. Section 3 presents models that have been proposed to account for the key empirical phenomena, and those phenomena are then described in Sections 4, 5, 6, and 7. I close with a discussion of development issues (Section 8), and directions for future research (Section 9). 2. Assessing Causal-Based Classification Effects The central question addressed by this literature is: What evidence does an object’s features provide for membership in a category as a function of the category’s network of interfeature causal relations? This section discusses two issues regarding the measurement of such effects, namely, the testing of natural versus novel categories and the type (and interpretation) of the classification tests administered. 2.1. Assessing Causal-Based Effects in Natural versus Novel Categories Studies have assessed causal-based effects on classification for both natural (real-world) categories and novel ones (made-up categories that are taught to subjects as part of the experimental session). When natural categories are tested, researchers first assess the theories that subjects hold for these categories and then test how those theories affect classification. For example, one common method is the theory drawing task in which subjects are presented with a category’s features and asked to draw the causal relations between those features and to estimate the strengths of those relations. Using this Causal-Based Classification: A Review 5 method, Sloman et al. (1998) measured theories for common-day objects (e.g., apples and guitars); Kim and Ahn (2002a; b) and Ahn Levin, and Marsh (2005) did so for psychiatric disorders such as depression and schizophrenia. In contrast, for novel categories subjects are explicitly instructed on interfeature causal links. For example, in a seminal study by Ahn, Kim, Lassaline, and Dennis (2000a), participants were instructed on a novel type of bird named roobans with three features: eats fruit (X), has sticky feet (Y), and builds nests on trees (Z). In addition, participants were told that features were related in a causal chain (Figure 1) in which X causes Y ("Eating fruit tends to cause roobans to have sticky feet because sugar in fruits is secreted through pores under their feet.") and Y causes Z ("Sticky feet tends to allow roobans to build nests on trees because they can climb up the trees easily with sticky feet."). Similarly, Rehder and Hastie (2001) instructed subjects on a novel type of star named myastars and how some features of myastars (e.g., high density) caused others (e.g., a large number of planets). Usually studies also provide some detail regarding causal mechanism regarding why one feature produces another. There are a number of advantages to testing novel rather than natural categories. One is that novel categories provide greater control over the causal relations that are used. For example, when classifying into natural categories, it is possible that limited cognitive resources (e.g., working memory) prevent subjects from using the dozens of causal relations they usually identify in a theory-drawing task. In contrast, experiments using novel categories usually teach subjects 2–4 causal links and the experimental context itself makes it clear that those causal links are the relevant ones (especially so, when the causal links are presented on the computer screen as part of the classification test). Of course, this does not rule out the use of additional causal links that subjects might assume are associated with particular ontological kinds by default (see Section 4.3 for one possibility in this regard). Another advantage of novel categories is that they can control for the numerous other factors besides causal knowledge that are known to influence category membership. For example, features that are more salient will have greater influence than less salient ones (e.g., Lamberts, 1995; 1998). And, feature importance is influenced by what I call empirical-statistical information, that is, how often features or exemplars are observed as occurring as category members and nonmembers (Rosch & Mervis, Causal-Based Classification: A Review 6 1975). Patterns of features that are observed to occur within category members (e.g., a feature’s category validity, the probability that it occurs given the category) may be especially problematic, because this information is likely to covary with their causal role. For example, a feature with many causes is likely to appear in more category members than one with few causes; two features that are causally related are also likely to be correlated in observed category members. Thus, any purported effect of causal knowledge on classification in natural categories might be due to the statistical patterns of features that causal links generate (and classifiers then observe) rather than the links themselves.1 In contrast, when novel categories are used, counterbalancing the assignment of features to causal role or use of multiple novel categories averages over effects of feature salience and contrast categories. And, that subjects have not seen examples of these made-up categories eliminates effects of empirical-statistical information. For these reasons, this chapter focuses on studies testing novel experimental categories. Of course, this is not to say that studies testing natural categories have not furthered our understanding of causal-based classification in critical ways, as such studies have reported numerous interesting and important findings (e.g., Ahn, 1998; Ahn, Flanagan, Marsh, & Sanislow, 2006; Kim & Ahn 2002a; b; Sloman et al, 1998). As always, research is advanced most rapidly by an interplay between studies testing natural materials (that afford ecological validity) and novel ones (that afford experimental control). However, when rigorous tests of computational models is the goal (as it is here), the more tightly controlled studies are to be emphasized. 2.2. Interpreting Classification Tests After subjects learn a novel category, they are presented with a series of objects and asked to render a category membership judgment. The question of how causal knowledge affects classification can be divided into two subquestions. The first is how causal knowledge affects the influence of individual features on classification. The second concerns how certain combinations of features make for better category members. In categorization research there is precedent for considering these two different types of effects. For example, Medin and Schaffer (1978) distinguished independent cue models, in which each feature provides an independent source of evidence for category membership (prototype models are an example of independent cue models), from interactive cue models, in which a feature’s influence depends Causal-Based Classification: A Review 7 on what other features are present (exemplar models are an example of interactive cue models). Of course, whereas most existing categorization models are concerned with how features directly observed in category members influence (independently or interactively) subsequent classification decisions, the current chapter is concerned with how classification is affected by interfeature causal relations. I will refer to one method for assessing the importance of individual features as the missing feature method. As mentioned, in the study by Ahn et al. (2000a), participants were instructed on novel categories (e.g., roobans) with features related in a causal chain (X!Y!Z). Participants were then presented with three items missing exactly one feature (one missing only X, one missing only Y, one missing only Z) and asked to rate how likely that item was a category member. Differences among the ratings of these items were interpreted as indicating how the relative importance of features varies as a function of their causal role. For example, that the missing-X item was rated a worse category member than the missing-Y item was taken to mean that X was more important than Y for establishing category membership. (The result that features are more important than those they cause is referred to as the causal status effect and will be discussed in detail in Section 4). The missing feature method has been used in numerous other studies (Ahn, 1998; Kim & Ahn, 2002a; b; Kim, Luhmann, Pierce, and Ryan, 2009; Luhmann, Ahn, & Palmeri, 2006; Sloman, Love, & Ahn, 1998). A different method for assessing feature weights was used by Rehder and Hastie (2001). They also instructed subjects on novel categories (e.g., myastars) with four features that causally related in various topologies. However, rather than just presenting test items missing one feature, Rehder and Hastie presented all 16 items that can be formed on four binary dimensions. Linear regression analyses were then performed on those ratings in which there was one predictor for each feature coding whether a feature was present or absent in a test item. The regression weight on each predictor was interpreted as the importance of that feature. Importantly, the regression equation also included two-way and higher-order interaction terms to allow an assessment of how important certain combinations of features are for forming good category members. For example, a predictor representing the two-way interaction between features X and Y encodes whether features X and Y are both present or both absent versus one present and the other absent, and the resulting regression weight on that predictor represents the importance to Causal-Based Classification: A Review 8 participants' categorization rating of dimensions X and Y having the same value (present or absent) or not in that test item. In fact, Rehder and Hastie found that subjects exhibited sensitivity to two-way and higher-order feature interactions, producing, for example, higher ratings when a cause and effect feature were both present or both absent and lower ratings when one was present and the other absent. (This phenomenon, known as the coherence effect, will be discussed in detail in Section 6). The regression method has been used in numerous studies (Rehder, 2003a; 2003b; 2007; Rehder & Kim, 2006; 2009b). Which of these methods for assessing causal-based classification effects should be preferred? One obvious advantage of the regression method is that it, unlike the missing feature method, provides a measure of feature interactions, an important consideration given the presence of large coherence effects described later. In addition however, there are also several reasons to prefer the regression method for assessing feature weights, which I now present in ascending order of importance. The first advantage is that regression is a generalization of a statistical analysis method that is already very familiar to empirical researchers, namely, analysis of variance (Judd & McClelland, 1989). For example, imagine an experiment in which subject are taught a category with three features X, Y, and Z and then rate the category membership of the eight distinct test items that can be formed on three binary dimensions. This experiment can be construed as a 2 x 2 x 2 within-subjects design in which the three factors are whether the feature is present or absent on dimension X, on dimension Y, and on dimension Z. The question of whether the regression weight on, say, feature X is significantly different than zero is identical to asking whether there is a “main effect” of dimension X. The question of whether the two-way interaction weight between X and Y is different than zero is identical to asking whether there is an interaction between dimensions X and Y. That is, one can ask whether the “main effect” of feature X is “moderated” by the presence or absence of feature Y (as one might expect if X and Y are causally related). The second reason to prefer regression is that it provides a more powerful method of statistical analysis. The regression weight on, say, dimension X amounts to a difference score between the ratings of the test items that have feature X and those that don’t. 2 Supposing again that the category has three features, the weight on X is the difference between the ratings on test items 111, 110, 101, and 100 versus Causal-Based Classification: A Review 9 011, 010, 001, and 000 (where “1” means that a feature is present and “0” that it is absent, e.g., 101 means that X and Z are present and Y absent). As a consequence, use of regression means that an assessment of X’s weight involves all eight test items, whereas the missing feature method bases it solely on the rating of one item (namely, 011). By averaging over the noise associated with multiple measures, the regression method produces a more statistically powerful assessment of the importance of features. Third, and most importantly, the missing feature method produces, as compared to regression, a potentially different and (as I shall show) incorrect assessment of a feature’s importance. This is the case because any single test item manifests both the independent and interactive effects of its features; in particular the rating of a test item missing a single feature is not a pure measure of that’s feature’s weight (i.e., it does not correspond to the “main effect” associated with the feature). For example, suppose that a category has three features in which X causes both Y and Z (that is, X, Y, and Z form a common cause network) and that subjects are asked to rate test items on a 1-10 scale. Assume that subjects produce a baseline classification rating of 5, that ratings are 1 point higher for each feature present in a test item and 1 point lower for each feature that is absent (and unchanged if the presence or absence of the feature is unknown). That is, features X, Y, and Z are all weighed equally. In addition, assume there exists interactive effects such that the rating goes 1 point higher whenever a cause and effect feature are both present or both absent and 1 point lower whenever one of those is present and the other absent. The classification ratings for this hypothetical experiment are presented in Example 1 in Table 1 for the eight test items that can be formed on the three dimensions and the three test items that have one feature present and two unknown (again, “1” = feature present and “0” = absent; “x” means the state of the feature is unknown). For instance, test item 110 (X and Y present, Z absent) has a rating of 6 because, as compared to an average rating of 5, it gains two points due to the presence X and Y, loses one because of the absence Z, gains one because the causally related X and Y are both present, and loses one because X is present but its effect Z is absent. It is informative to compare the different conclusions reached by the missing feature method and linear regression regarding feature weights in this example. Importantly, the rating of 4 received by the item missing only feature X (011) is lower than the rating of 6 given to the items missing only Y (101) or Causal-Based Classification: A Review 10 Z (110). This result seems to imply (according to the missing feature method), that X is more important than Y and Z. However, this conclusion is at odds with the conditions that were stipulated in the example, namely, that all three features were weighed equally. In fact, item 011 is rated lower not because feature X is more important, but rather because it includes two violations of causal relations (X is absent even though both of its effects are present) whereas 101 and 110 have only one (in each, the cause X is present and one effect is absent). This example demonstrates how the missing feature method can mischaracterize an interactive effect of features as a main effect of a single feature. In contrast, a regression analysis applied to the eight test items in Example 1 correctly recovers the fact that all features are weighed equally and, moreover, the interactive effect between the causally related features X and Y, and X and Z. Specifically, for Example 1 the regression equation would be, ratingi = "0 + "XfX + "YfY + "ZfZ+ "XYfXY+ "XZfXZ+ "YZf YZ+ "XYZfXYZ where ratingi is the rating for test item i, fj = +1 when feature j is present in test item i and –1 when it is absent, fjk = fj fk, and fXYZ= fXfYfZ. This regression analysis yields "0 = 5, "X = "Y = "Z = 1, "XY = "XZ = 1, and "YZ = "XYZ = 0. These beta weights are of course just those that were stipulated in the example. It is important to recognize that the alternative conclusions reached by the two methods are not merely two different but equally plausible senses of what we mean by “feature weights” in classification. The critical test of whether the weight assigned to a feature is correct is whether it generalizes sensibly to other potential test items—the value of knowing the importance of individual features is that it allows one to estimate the category membership of any potential item, not just those presented on a classification test. For example, if for Example 1 you were to conclude (on the basis of the missing feature method) that X is the most important feature, you would predict that the item with only feature X (100) should be rated higher than those with only Y (010) or only Z (001). And, for items that have only one known feature, you would predict that 1xx should be rated higher than x1x or xx1. Table 1 reveals that these predictions would be incorrect, however: Item 100 is rated lower than 010 and 001 (because 100 violates two causal links whereas the others violate only one) and 1xx, x1x, and xx1 all have the same rating (6). This example illustrates how the missing feature method can yield a measure of feature importance that fails to generalize to other items. Causal-Based Classification: A Review 11 These conclusions do not depend on all features being equally weighed as they are in Example 1. Example 2 in Table 1 differs from Example 1 in that the weights on features Y and Z have been reduced to .5 (so that X now is the most heavily weighed feature). The missing feature method again assigns (correctly, in this case) a greater weight to feature X (because 011’s rating of 3 is lower than the 6 given to 101 or 110). But whereas it then predicts that item 100 should be rated higher than 010 or 001, in fact that item is rated lower (3 vs. 4) because the two violations of causal relations in item 100 outweigh the presence of the more heavily weighed X. Nor are these sorts of effects limited to a common cause network. In Examples 3 and 4, X, Y, and Z are arranged in a causal chain. Despite that features are weighed either equally (Example 3) or X > Y = Z (Example 4), in both examples the item missing feature Y (101) is rated lower than 011 or 110, and thus the missing feature method would incorrectly identify Y as the most important feature. Together, Examples 1–4 demonstrate how the missing feature method can systematically mischaracterize the true effect of individual features on classification. In contrast, a regression analysis recovers the correct feature weights and the interactive effects in all four examples. The three advantages associated with the regression method mean that it is a superior method for assessing the effect of causal knowledge on classification. Are there any potential drawbacks to use of this method? Three issues are worth mentioning. First, to allow an assessment of both feature weights and interactions, the regression method requires that subjects rate a larger number of test items and thus it is important to consider what negative impact this might have. A longer test increases the probability that fatigue might set it in and, in designs in which subjects must remember the causal links on which they are instructed, that subjects might start to forget those links. To test whether this in fact occurs, I reanalyzed data from a study reported by Rehder and Kim (2006) in which subjects were taught categories with five features and up to four causal links and then asked to rate a large number of test items, namely, 32. Even though subjects had to remember the causal links (because they were not displayed during the classification test), they exhibited significant effects of causal knowledge on both feature weights and feature interactions, indicating that they made use of that knowledge. Moreover, the reanalysis revealed that the magnitude of these effects for the last 16 test items was the same as the first 16. Thus, although fatigue and memory loss associated with a larger number of test items are valid concerns, at present there Causal-Based Classification: A Review 12 is no empirical evidence that this occurs in the sort of studies reviewed here. Of course, fatigue and memory loss may become problems if an even larger number of causal links and test items than in Rehder and Kim (2006) are used. Another potential consequence of the number of items presented on classification test is that, because it is well known that an item’s rating will change depending on the presence of other items (Poulton, 1989), the missing feature or regression methods may yield different ratings for exactly the same item. For instance, in Example 1 the ratings difference between the missing-X (011) and the missing-Y and –Z items (101 and 110) is likely to be larger when those are the only items being rated (the missing feature method) as compared to when both very likely (111) and unlikely (100) category members are included (the regression method). This is so because the absence of these latter items will result in the response scale expanding as subjects attempt to make full use of the scale; their presence will result in the scale contracting because 111 and 100 “anchor” the ends of the scale. (An example of this sort of scale contraction and expansion is presented in Section 4.5.1.) For this reason, it is ill advised to compare test item ratings across conditions and studies that differ in the number of different types of test items presented. A third potential issue is whether the classification rating scale used in these experiments can be interpreted as an interval scale. As mentioned, the weights produced by the regression method are a series of difference scores. Because in general those differences involve different parts of the response scale, comparing different weights requires the assumption that the scale is being used uniformly. But of course issues such as the contraction that occurs at the ends of scales are well known. 3 Transformations (e.g., arc-sine) are available of course; in addition, more recent studies in my lab have begun to use forcedchoice judgments (and logistic regression) to avoid these issues (see Sections 7 and 8 for examples). In summary, regression is superior to the missing feature method because it (a) assesses feature interactions, (b) is closely relation to ANOVA, (c) yields greater statistical power, and (d) yields feature weights that generalize correctly to other items. At the same time, care concerning the number of test items and issues of scale usage must be exercised; other sorts of test (e.g., forced-choice) might be appropriate in some circumstances. Of course, although assessing “effects” properly is important, the Causal-Based Classification: A Review 13 central goal of research is not to merely catalog effects but also to propose theoretical explanations of those effects. Still, like other fields, this one rests on empirical studies that describe how experimental manipulations influence the presence and size of various effects, and false claims and controversies can arise in the absence of a sound method for assessing those effects. These concerns are not merely hypothetical. Section 8 describes experimental results demonstrating that previous conclusions reached on the basis of a study using the missing feature method were likely due to an interactive effect of features rather than a main effect of feature weights. 2.3. Terminology A final note on terminology is in order. Whereas Sloman et al. (1998) have used the terms mutability or conceptual centrality, these refer to a property of individual features. However, I have noted how causal knowledge may also affect how combinations of features can influence classification decisions. Accordingly, I will simply use the term classification weight to refer to the weight that features and certain combinations of features have for membership in a particular category. 3. Computational Models I now present two computational models that have been offered as accounts of the effects of causal knowledge on categorization. Both models specify a rule that assigns to an object a measure of its membership in a category on the basis of that category’s network of interfeature causal relations. Nothing is assumed about the nature of those causal links other than their strength. Neither model denies the existence of other effects on classification, such as the presence of contrast categories, the salience of particular features, or the empirical/statistical information that people observe firsthand. Rather, the claim is that causal relations will have the predicted effects when these factors are controlled. 3.1. The Dependency Model One model is Sloman et al.'s (1998) dependency model. The dependency model is based on the intuition that features are more important to category membership (i.e., are more conceptually central) to the extent they have more dependents, that is, features that depend on them (directly or indirectly). A causal relation is an example of a dependency relation in which the effect depends on its cause. For example, DNA is more important than the color of an animal's fur because so much depends on DNA; Causal-Based Classification: A Review 14 hormones are more important than the size of its eyes for the same reason. According to the dependency model, feature i's weight or centrality, ci,, can be computed from the iterative equation, ci,t+1 = #dijcj,t (1) where ci,t is i's weight at iteration t and dij is the strength of the causal link between i and its dependent j. For example, if a category has three category features X, Y, and Z, and X causes Y which causes Z (as in Figure 1), then when cZ,1 is initialized to 1 and each causal link has a strength of 2, after two iterations the centralities for X, Y, and Z are 4, 2, and 1. That is, feature X is more important to category membership than Y which in turn is more important than Z. Stated qualitatively, the dependency model predicts a difference in feature weights because X, Y, and Z vary in the number of dependents they have: X has two (Y and Z), Y has one (Z), and Z has none. Table 2 also presents how the feature weights predicted by the dependency model vary as a function of causal strength parameter (the ds). These predictions have been tested in experiments described in Section 4. Although the dependency model was successfully applied to natural categories in Sloman et al., its original formulation makes it technically inapplicable to many causal networks of theoretical interest. However, Kim et al. (2009) have recently proposed new variants of the dependency model that address these issues, allowing it to be applied to any network topology. Still, these variants inherent the same qualitative properties as their predecessor, namely, features grow in importance as a function of their number of dependents and the strengths of the causal links with those dependents.4 Note that while the dependency model and its variants specify how feature weights vary as a function of causal network, it makes no predictions regarding how feature combinations makes for better or worse category members (i.e., it predicts the absence of interactive effects). This is one important property distinguishing it from the next model. 3.2. The Generative Model The second model is the generative model (Rehder, 2003a; b; Rehder & Kim, 2006). Building on causal-model theory (Waldmann & Holyoak, 1992; Sloman, 2005), the generative model assumes that interfeature causal relations are represented as probabilistic causal mechanisms and that classifiers Causal-Based Classification: A Review 15 consider whether an object is likely to have been produced or generated by those mechanisms. Objects that are likely to have been generated by a category's causal model are considered to be good category members and those unlikely to be generated are poor category members. Quantitative predictions for the generative model can be generated assuming a particular representation of causal relations first introduced by Cheng (1997) and later applied to a variety of category-based tasks (Rehder & Hastie, 2001; Rehder, 2003a; b; Rehder, 2009a; Rehder & Kim, 2006; 2009a; 2009b; Rehder & Burnett, 2005). Assume that category k’s causal mechanism relating feature j and its parent i operates (i.e., produces j) with probability mij when i is present and that any other potential background causes of j collectively operate with probability bj. Given other reasonable assumptions (e.g., the independence of causal mechanisms, see Cheng & Novick, 2005), then j’s parents and the background causes form a "fuzzy-or" network that together produce j in members of category k conditional on the state of j’s parents with probability, pk ( j | parents( j )) = 1" (1" b j )$ (1" m ) ( ) i# parents j ind ( i) ij (2) where ind(i) is an indicator variable that evaluates to 1 when i is present and 0 otherwise. The probability of a root ! cause r is a free parameter cr. For example, for the simple chain network in Figure 1 in which nodes have at most one parent, the probability of j when its parent i is present is pk ( j | i) = 1" (1" b j )(1" mij ) = mij + b j " mij b j (3) That is, the probability of j is the probability that it is brought about by its parent or by its background causes. ! When i is absent, the causal mechanism mij has no effect on j and thus the probability of j is simply ( ) pk j | i = b j (4) By applying Equation 2 iteratively, one can derive the equations representing the likelihood of any possible combination of the presence or absence of features in any causal network. For example, ! Table 3 presents the likelihood equations for the chain network in Figure 1 for any combination of features X, Y, and Z. Table 3 also presents the probability of each item for a number of different Causal-Based Classification: A Review 16 parameter values. The strengths of the causal links between X and Y (mXY) and between Y and Z (mYZ) are varied over the values 0, .33, .75, .90, and 1.0 while bY and bZ are held fixed at .10. In addition, bY and bZ are varied over the values 0, .25, and .50 while mXY and mYZ are held fixed at .75. Parameter cX (the probability that the root cause feature X appears in members of category k) is fixed at .75, consistent with the assumption that X is a typical feature of category k. Table 3 indicates how causal relations determine a category’s statistical distribution of features among category members. The assumption of course is that these item probabilities are related to their category membership: Items that with a high probability of being generated are likely category members and those with a low probability of generation are unlikely ones. That is, according to the generative model, the effects of causal relations on classification are mediated by the statistical distribution of features that those relations are expected to produce. Although the generative model’s main predictions concern whole items, from these probability distributions one can derive statistics corresponding to the two sorts of empirical effects I have described, namely, feature weights and feature interactions. For example, from Equations 3 and 4 the probability of j appearing in a k category member is ( ) () p k ( j ) = p k ( j | i ) p k (i ) + p k j | i p k i () pk ( j ) = ( mij + b j " mij b j ) pk (i) + b j pk i ! pk ( j ) = mij pk (i ) + b j " mij b j pk (i) (5) where!i is the parent of j. Table 3 presents the probability of each feature for each set of parameter values. For example, when cX = .75, mXY = mYZ = .90, and bY = bZ = .10, then pk(X) = .750, pk(Y) = .708, and ! pk(Z) = .673. That is, feature X has a larger “weight” than Y which in turn is larger than Z’s. Table 3 presents how the feature weights predicted by the generative model vary as a function of parameters. In Section 4, I will show how the generative and dependency models make qualitatively differently predictions regarding how feature weights vary across parameter settings and present results of experiments testing those predictions. Importantly, the generative model also predicts that causally related features should be correlated within category members. A quantity that reflects a dependency—and hence a correlation—between two Causal-Based Classification: A Review 17 variables is the probabilistic contrast. The probabilistic contrast between a cause i and an effect j is defined as the difference between the probability of j given the presence or absence of i: ( ) "pk (i, j ) = pk ( j | i) # pk j | i (6) For the causal network in Figure 1, Table 3 shows how the contrasts between the directly causally related !features, $pk (X, Y) and $pk (X, Z), are greater than 0, indicating how those pairs of features should be correlated (a relation that holds so long as the ms > 0 and the bs < 1). Moreover, the contrast between the two indirectly related features, $pk (X, Z), is greater than zero but less than the direct contrasts, indicating that X and Z should also be correlated, albeit more weakly. In other words, the generative model predicts interactive effects: Objects will be considered good category members to the extent they maintain expected correlations and worse ones to the extent they break those correlations. That the generative model predicts interactive effects between features is a key property distinguishing it from the dependency model. Table 3 presents how the pairwise feature contrasts predicted by the generative model vary as a function of parameters, predictions that have been tested in experiments described in Section 6. The generative model also makes predictions regarding the patterns of higher-order interactions one expects for a causal network. For example, a higher order contrast that defines how the contrast between i and j is itself moderated by h is given by Equation 7. [ ( )] [ ( ) ( "pk ( h,i, j ) = pk ( j | ih ) # pk j | ih # pk j | ih # pk j | ih )] (7) Table 3 indicates that for a chain network $pk (X, Y, Z) = 0, indicating that the contrast between Y and Z is itself ! unaffected by the state of X. This corresponds to the well-known causal Markov condition in which Y “screens off” Z from X; likewise $pk (Z, X, Y) = 0 means that Y screens off X from Z (Pearl, 1988). In Section 6 I will demonstrate how these sorts of higher order contrasts manifest themselves in classification judgments. 4. The Causal Status Effect This section begins the review of the major phenomena that have been discovered in the causalbased categorization literature. In each of the following four sections I define the phenomenon, discuss the major theoretical variables that have been shown to influence that phenomenon, and consider the Causal-Based Classification: A Review 18 implications these results have for the two computational models just described. I also briefly discuss other variables (e.g., experimental details of secondary theoretical importance) that have also been shown to have an influence. As mentioned, this review focuses on studies testing novel categories, that is, ones with which subjects have no prior experience because they are learned as part of the experiment. There are two reasons for this. The first is that there are already good reviews of work testing the effects of causal knowledge on real-world categories (e.g., Ahn & Kim, 2001). The second is the presence of confounds associated with natural materials (e.g., the presence of contrast categories, the different salience of features, the effects of empirical-statistical information, etc.) already noted in Section 2. The first empirical phenomenon I discuss is the causal status effect, an effect on feature weights in which features that appear earlier in a category's causal network (and thus are "more causal") carry greater weight in categorization decisions. For example, in Figure 1, X is the most causal feature, Z is the least causal, and Y is intermediate. As a consequence, all else being equal, X should be weighed more heavily than Y which should be weighed more heavily than Z. In fact, numerous studies have demonstrated situations in which features are more important than those they cause (Ahn, 1998; Ahn et al, 2000a; Kim et al. 2009; Luhmann et al. 2006; Rehder 2003b; Rehder & Kim, 2006; Sloman et al. 1998). Nevertheless, that the size of the causal status effect can vary dramatically across studies—in many it is absent entirely—raises questions about the conditions under which it appears. For example, in the Ahn et al. (2000a) study participants learned novel categories with three features X!Y!Z and then rated the category membership of items missing exactly one feature on a 0 to 100 scale. The item missing X was rated lower (27) than one missing Y (40) which in turn was lower than the one missing Z (62), suggesting that X is more important than Y which is more important than Z. A large causal status effect was also found in Ahn (1998, Experiments 3 and 4) and Sloman et al. (1998, Study 3).5 In contrast, in Rehder and Kim (2006) participants learned categories in which 3 out of 5 features were connected in a causal chain and then rated the category membership of a number of test items. To assess the importance of features, regression analyses were performed on those ratings. Unlike Ahn et al., Rehder and Kim found only a modest (albeit significant) difference in the regression weights of X and Y Causal-Based Classification: A Review 19 (7.6 and 6.4), a difference that reflected the nearly equal ratings on the missing-X and missing-Y items (43 and 47, respectively). In contrast, the regression weight on Z (6.2) and the rating of the missing-Z item (48) indicated no difference in importance between features Y and Z. Similarly, testing categories with four features, Rehder (2003b) found a partial causal status effect (a larger weight on the chain's first feature and smaller but equal weights on the remaining ones) in one experiment and no causal status effect at all in another. What factors are responsible for these disparate results? Based on the contrasting predictions of the dependency and generative models presented earlier, I now review recent experiments testing a number of variables potentially responsible for the causal status effect. 4.1. Causal Link Strength One factor that may influence the causal status effect is the strength of the causal links. For example, whereas Ahn (1998) and Ahn et al. (2000a) described the causal relationships in probabilistic terms by use of the phrase "tends to" (e.g., "Sticky feet tends to allow roobans to build nests on trees."), Rehder and Kim (2006) omitted any information about the strength of the causal links. This omission may have invited participants to interpret the causal links as deterministic (i.e., the cause always produces the effect), and this difference in the perceived strength of the causal links may be responsible for the different results.6 To test this hypothesis, Rehder and Kim (2009b, Experiment 1) directly manipulated the strength of the causal links. All participants were taught three category features and two causal relationships linking X, Y, and Z into a causal chain (as in Figure 1). For example, participants who learned myastars were told that the typical features of myastars were a hot temperature, high density, and a large number of planets and that hot temperature causes high density which in turn causes a large number of planets. Each feature included information about the other value on the same stimulus dimension (e.g., “Most myastars have high density whereas some have low density.”). Each causal link was accompanied with information about the causal mechanism (e.g., “High density causes the star to have a large number of planets. Helium, which cannot be compressed into a small area, is spun off the star, and serves as the raw material for many planets.”). In addition, participants were given explicit information about the strengths of those Causal-Based Classification: A Review 20 causal links. For example, participants in the Chain-100 condition were told that each causal link had a strength of 100%: "Whenever a myastar has high density, it will cause that star to have a large number of planets with probability 100%." Participants in the Chain-75 condition were told that the causal links operated with probability 75% instead. Participants then rated the category membership of all eight items that could be formed on the three binary dimensions. A Control condition in which no causal links were presented was also tested. The dependency and generative models make distinct predictions for this experiment. Table 2 shows that the dependency model predicts that the size of the causal status effect is an increasing monotonic function of causal strength. For example, after two iterations feature weights are 4, 2, and 1 when cZ,1 = 1 and dXY = dYZ = 2 (yielding a difference of 3 between the weights of X and Z) versus 9, 3, and 1 when dXY = dYZ = 3 (a difference of 8). Intuitively, it makes this prediction because stronger causal relations mean that Y is more dependent on X and Z is more dependent on Y. As a consequence, the dependency model predicts a stronger causal status effect in the Chain-100 condition versus the Chain-75 condition. In contrast, Table 3 shows that the generative model predicts that the size of causal status effect should decrease as the strength of the causal links increases; indeed, the causal status effect can even reverse at high levels of causal strength. For example, when bX = bZ = .10, Table 3 shows that the difference between pk(X) and pk(Z) (a measure of the causal status effect) is .553, .241, .077 and –.048 for causal strengths of .33, .75, .90 and 1.0, respectively. Intuitively, a causal status effect is more likely for probabilistic links because X generates Y, and Y generates Z, with decreasing probability. For example, if cX = .75, mXY = mYX = .75, and there are no background causes (bs = 0), then pk(X) = .750, pk(Y) = .7502 = .563, and pk(Z) = .7503 = .422. Thus, so long as the b parameters (which work to increase the probability of Y and Z) are modest, the result will be that pk(X) will be larger than pk(Z). In contrast, a causal status effect is absent for deterministic links because X always generates Y, and Y always generates Z. For example, if cX = .75, ms = 1, and bs = 0, pk(X) = pk(Y) = pk(Z) = .750, and the causal status effect grows increasingly negative (i.e., pk(Z) becomes greater than pk(X)) as the bs increase. Note that because one also expects features to be weighed equally in the absence of any causal links between Causal-Based Classification: A Review 21 features, the generative model predicts that the causal status effect should vary nonmonotonically with causal strength: It should be zero when mXY = mYZ = 0, large when the ms are intermediate, and zero (or even negative) when the ms = 1. Thus the generative model predicts a stronger causal status effect in the Chain-75 condition versus the Chain-100 condition and the Control conditions. Following Rehder and Kim (2006), regression analyses were performed on each subjects’ classification ratings with predictors for each feature and each two-way interaction between features. The regression weights averaged over subjects for features X, Y, and Z are presented in the left panel of Figure 2A. (The right panel, which presents the two-way interactions weights, will be discussed in Section 6.) In fact, a larger causal status effect obtained in the Chain-75 condition in which the causal links were probabilistic as compared to the Chain-100 condition in which they were deterministic; indeed in the Chain-100 condition the causal status effect was absent entirely (the small quadratic effect of features suggested by Figure 2A was not significant). Of course, a stronger causal status effect with weaker causal links is consistent with the predictions of the generative model and inconsistent with those of the dependency model. As expected, in the Control condition (not shown in Figure 2A), all feature weights were equal. As a further test of the generative model, after the classification test we also asked subjects to estimate how frequently each feature appeared in category members. For example, subjects who learned myastars were asked how many myastars out of 100 many would have high density. Recall that the generative model predicts that effects of causal knowledge changes people’s beliefs regarding how often features appear in category members, and, if this is correct, the effects uncovered in the classification test should be reflected in the feature likelihood ratings. The results of the feature likelihood ratings, presented in Figure 3A, support this conjecture: Likelihood ratings decreased significantly from feature X to Y to Z in the Chain-75 condition whereas those ratings were flat in the Chain-100 condition, mirroring the classification results in Figure 2A. This finding supports the generative model’s claim that causal relations change classifiers’ subjective beliefs about the category’s statistical distribution of features (also see Sloman et al., 1998). Clearly, causal link strength is one key variable influencing the causal status effect. Additional evidence for this conclusion is presented in Section 4.5. Causal-Based Classification: A Review 22 4.2. Background Causes Experiment 2 from Rehder and Kim (2009b) conducted another test of the generative and dependency models by manipulating the strength of alternative causes of the category features, that is, the generative model’s b parameters. All participants were instructed on categories with three features related in a causal chain in which each causal link had a strength of 75%. However, in the Background-0 condition, they were also told that there were no other causes of the features. For example, participants who learned about myastars learned not only that high density causes a large number of planets with probability 75%, but also that “There are no other causes of a large number of planets. Because of this, when its known cause (high density) is absent, a large number of planets occurs in 0% of all myastars.” In contrast, in the Background-50 condition these participants were told that “There are also one or more other features of myastars that cause a large number of planets. Because of this, even when its known cause (high density) is absent, a large number of planets occurs in 50% of all myastars.” Table 3 shows how the generative model's predictions vary with the b parameters and indicates that the causal status effect should become weaker as features’ potential background causes get stronger; indeed, it should reverse as b grows much larger than .50. Specifically, when the ms = .75 the difference between pk(X) and pk(Z) is .328, .122, and –.043 for values of the bY and bZ of 0, .25, and .50, respectively (Table 3). Intuitively, this occurs because as bY and bZ increase they make the features Y and Z more likely; indeed, as the bs approach 1, Y and Z will be present in all category members. As a consequence, the generative model predicts a larger causal status effect in the Background-0 condition as compared to the Background-50 condition. The dependency model, in contrast, makes a different prediction for this experiment. Because it specifies that a feature's centrality is a sole function of its dependents, supplying a feature with additional causes (in the form of background causes) should have no effect on its centrality. Thus, because centrality should be unaffected by the background cause manipulation, the dependency model predicts an identical causal status effect in the Background-0 and Background-50 conditions.7 Regression weights derived from subjects’ classification ratings are shown in Figure 2B. The results were clear-cut: A larger causal status effect obtained in the Background-0 condition in which Causal-Based Classification: A Review 23 background causes were absent as compared to the Background-50 condition in which they weren’t; indeed in the Background-50 condition the causal status effect was absent entirely. Moreover, these regression weights were mirrored in subjects’ explicit feature likelihood ratings (Figure 3B). These results confirm the predictions of the generative model and disconfirm those of the dependency model. The strength of background causes is a second key variable affecting the causal status effect. 4.3. Unobserved “Essential” Features The preceding two experiments tested the effect of varying the m and b parameters on the causal status effect. However, there are reasons to expect that categorizers sometimes reason with a causal model that is more elaborate than one that includes only observable features. For example, numerous researchers have suggested that people view many kinds as being defined by underlying properties or characteristics (an essence) that is shared by all category members and by members of no other categories (Gelman, 2003; Keil, 1989; Medin & Ortony, 1989; Rehder & Kim, 2009a; Rips, 1989) and that are presumed to generate, or cause, perceptual features. Although many artifacts do not appear to have internal causal mechanisms (e.g., pencils and wastebaskets), it has been suggested that the essential properties of artifacts may be the intentions of the their designers (Bloom, 1998; Keil, 1995; Matan & Carey, 2001; Rips, 1989; cf. Malt, 1994; Malt & Johnson, 1992). Thus, the causal model that people reason with during categorization may include the underlying causes they assume produce a category’s observable features. Rehder and Kim (2009b, Experiment 3) tested the importance of the category being essentialized by comparing the causal structures shown in Figure 4. As in the two preceding experiments, each category consisted of three observable features related in a causal chain. However, the categories were now “essentialized” by endowing them with an additional feature that exhibits an important characteristic of an essence, namely, it appears in all members of the category and in members of no other category. For example, for myastars the essential property was "ionized helium," and participants were told that all myastars possess ionized helium and that no other kind of star does.8 In addition, in the EssentializedChain-80 condition (Figure 4A) but not the Unconnected-Chain-80 condition (Figure 4B) participants were also instructed on a third causal relationship linking feature X to the essential feature (e.g., in myastars, that ionized helium causes high temperature, where high temperature played the role of X). All Causal-Based Classification: A Review 24 causal links were presented as probabilistic by describing them as possessing a strength of 80%. A Control condition in which no causal links are provided was also tested. After learning the categories, participants performed a classification test that was identical to the previous two experiments. (In particular, the state of the essential property was not displayed in any test item.) Linking X, Y, and Z to an essential feature should have two effects on classification ratings. First, because the link between E and X has a strength of mEX = .80, then the probability of feature X within category members should be at least .80. This is greater than the value expected in the UnconnectedChain-80 condition on the basis of the first two experiments (in which subjects estimated pk(X) to be a little over .75; see Figures 3A and 3B). Second, the larger value of pk(X) should produce an enhanced causal status effect, because the larger value of pk(X) results in a greater drop between it and pk(Y) (and the larger value of pk(Y) results in a greater drop between it and pk(Z)). These effects are apparent in Table 4 that presents the generative model’s quantitative predictions for the case when the b parameters equal .10. (The table also includes predictions for a case, discussed below, where the m parameters are 1.0 instead of .80.) Table 4 confirms that the size of the causal status effect should be larger in the Essentialized-Chain-80 condition (a difference between pk(X) and pk(Z) of .223) than the UnconnectedChain-80 condition (.189) when the bs = .10; this prediction holds for any value of the bs < 1. In contrast, the dependency model predicts no difference between the two conditions. Because that model claims that a feature's centrality is determined by its dependents rather than its causes, providing feature X with an additional cause in the Essentialized-Chain-80 condition should have no influence on its centrality. The feature weights derived from subjects’ classification ratings are presented in Figure 2C. Consistent with the predictions of the generative model, a larger causal status effect obtained in the Essentialized-Chain-80 condition as compared to the Unconnected-Chain-80 condition; indeed, although feature weights decreased in the Unconnected-Chain-80 condition, this decrease did not reach significance. This same pattern was also reflected in feature likelihood ratings (Figure 3C): decreasing feature likelihoods in the Essentialized-Chain-80 but not the Unconnected-Chain-80 condition.9 As expected, all feature weights and likelihood ratings were equal in the Control condition. Other studies have found that essentialized categories lead to an enhanced causal status effect. Causal-Based Classification: A Review 25 Using the same materials, Rehder (2003b, Experiment 3) found larger a causal status effect with essentialized categories even when the strength of the causal link was unspecified. And, Ahn and colleagues have found that expert clinicians both view mental disorders as less essentialized than laypersons (Ahn et al., 2006) and exhibit only a weak causal status effect (Ahn et al., 2005). These results show that an essentialized category is a third key variable determining the size of the causal status effect. However, note that the generative model’s predictions regarding essentialized categories itself interacts with causal link strength: When links are deterministic, essentializing a category should yield feature weights that are larger but equal to one another (because each feature in the chain is produced with the same probability as its parent, namely, 1.0)—that is, no causal status effect should obtain (Table 4). These predictions were confirmed by Rehder and Kim’s (2009b) Experiment 4, which was identical to Experiment 3 except that the strengths of the causal links were 100% rather than 80%. 4.4. Number of Dependents Yet another potential influence on the causal status effect is a feature’s number of dependents. Rehder and Kim (2006, Experiment 3) assessed this variable by testing the two network topologies in Figure 5. Participants in both conditions were instructed on categories with five features, but whereas feature Y had three dependents in the 1-1-3 condition (1 root cause, 1 intermediate cause, 3 effects), it had only one in the 1-1-1 condition. In this experiment, no information about the causal strengths or background causes was provided. In the 1-1-1 condition, which feature played the role of Y‘s effect was balanced between Z1, Z2, and Z3. After learning these causal category structures, subjects were asked to rate the category membership of all 32 items that could be formed on the five binary dimensions. The dependency and generative models again make distinct predictions for this experiment. According to the dependency model, its greater number of dependents in the 1-1-3 condition means that Y is more central relative to the 1-1-1 condition. Likewise, its greater number of indirect dependents in the 1-1-3 condition means that X is relatively more central as well. As a result, the dependency model predicts a larger causal status effect in the 1-1-3 condition than in the 1-1-1 condition. For example, according to Equation 1, whereas in the 1-1-3 condition feature centralities are 12, 6, and 1 for X, Y, and the Zs, respectively, after two iterations when cz,1 = 1 and the ds = 2, they are 4, 2, and 1 in the 1-1-1 Causal-Based Classification: A Review 26 condition. The generative model, in contrast, predicts no difference between conditions, because Y having more effects doesn’t change the chance that it will be generated by its category. The results of this experiment are presented in Figure 6. (In the figure, the weight for “Z (effect)” is averaged over Z1, Z2, and Z3 in the 1-1-3 condition and is for the single causally related Z in the 1-1-1 condition. The weights for the “Z (isolated)” features will be discussed later in Section 5.1.) The figure confirms the predictions of the generative model and disconfirms those of the dependency model. First, as described earlier, tests of a three element causal chain in this study (the 1-1-1 condition) produced a relatively small and partial causal status effect (X was weighed more than Y which was weighed the same as Z). But more importantly, the size of the causal status effect was not larger in the 1-1-3 condition than the 1-1-1 condition. (Although weights were larger overall in the 1-1-1 condition, this difference was only marginally significant.) These results show that features’ number of dependents does not increase the size of the causal status effect. 4.5. Other Factors In this section I present other factors that have been shown to influence the size of the causal status effect, factors not directly relevant to the predictions of either the dependency or generative models. 4.5.1. Number of test items: Rehder (2009b). Recall that whereas Ahn et al. (2000a) observed a difference of 35 points in the rating of the item missing only X versus the one missing only Z, that difference was an order of magnitude smaller in Rehder and Kim (2006). Besides the difference in the implied strength of the causal links, these studies also differed in the number of classification test items presented (3 vs. 32, respectively). As argued in Section 2.2, it is likely that the presence of very likely and very unlikely category members that anchor the high and low ends of the response scale will decrease the differences between intermediate items such as those missing one feature (scale contraction) whereas the absence of extreme items will increase that difference (scale expansion) (Poulton, 1989). In addition, rating items that differed only on which feature was missing may have triggered a comparison of the relative importance of those features that wouldn't have occurred otherwise. In other words, a large causal status effect may have arisen partly because of task demands. To test these conjectures, Rehder (2009b) replicated the original Ahn et al. (2000a) study but Causal-Based Classification: A Review 27 manipulated the total number of test items between 3 (those missing just one feature) and 8 (all test items that can be formed on three binary dimensions). In addition, as an additional test of the role of causal strength, I compared a condition with the original wording implying a probabilistic relation (e.g., "Sticky feet tends to allow roobans to build nests on trees.") with one, which implied a deterministic relation (e.g., "Sticky feet always allow roobans to build nests on trees."). The procedure was identical to that in Ahn et al. except that subjects previewed all the test items before rating them. Figure 7 presents the size of the causal status effect measured by the difference between the missing-Z and missing-X test items. First note that the causal status effect was larger in the probabilistic versus the deterministic condition, replicating the findings described earlier in Section 4.1 in which a stronger causal status effect obtains with weaker causal links (Rehder & Kim, 2009b, Experiment 1). But the causal status effect was also influenced by the number of test items. For example, whereas the probabilistic condition replicated the large causal status effect found in Ahn et al. when subjects rated only 3 test items (a difference of 24 points between missing-Z and missing-X items), it was reduced when 8 items were rated (11 points); in the deterministic condition it was reduced from 5.5 to –3.4. Overall, the causal status effect reached significance in the probabilistic/3 test item condition only. These results confirm that scale expansion and/or task demands can magnify the causal status effect when only a small number of test items are rated. 4.5.2. “Functional” features: Lombrozo (2009). Lombrozo (2009) tested how feature importance varies depending on whether it is “functional,” that is, whether for a biological species it is an adaptation that is useful for survival or whether for an artifact it affords some useful purpose. For example, participants were first told about a type of flower called a holing, that holings have broom compounds in their stems (feature X) that causes them to bend over as they grow (feature Y). Moreover, they were told that the bending is useful because it allows pollen to brush onto the fur of field mice. In a Mechanistic condition, participants were then asked to explain why holings bend over, a question which invites either a mechanistic response (because broom compounds causes them to) or a functional response (because bending over is useful for spreading pollen). In contrast, in the Functional condition participants were then asked what purpose bending over served, a question that invites only a functional response. All Causal-Based Classification: A Review 28 subjects were then shown two items, one missing X and one missing Y, and asked which was more likely to be a holing. Whereas subjects chose the missing-Y item 71% of the time in the Mechanistic condition (i.e., they exhibited a causal status effect), this effect was eliminated (55%) in the Functional condition. Although the effect size in this study were small (reaching significance at the .05 level only by testing 192 subjects), the potential functions that features afford, a factor closely related to their place in category’s causal network, may be an important new variable determining their importance to classification. 4.6. Theoretical Implications: Discussion Together, the reviewed studies paint a reasonably clear picture of the conditions under which a causal status effect does and does not occur. Generally, what appears to be going on is this. When confronted with a causal network of features, classifiers will often adopt a "generative" perspective, that is, they will think about the likelihood of each successive event in the chain. This process may be equivalent to a kind of mental simulation in which they repeatedly "run" (consciously or unconsciously) a causal chain. Feature probabilities are then estimated by averaging over runs. Of course, in a run the likelihood of each subsequent feature in the chain increases as a function of the strength of chain's causal links. When links are deterministic then all features will be present whenever the chain's root cause is; in such cases, no causal status effect appears. However, when causal links are probabilistic each feature is generated with less certainty at every step in the causal chain, and thus a causal status effect arises. But working against this effect is classifiers' beliefs about the strength of alternative "background" causes. Background causes will raise the probability of each feature in the causal chain in each simulation run, and, if sufficiently strong, will cancel out (and possibly even reverse) the causal status effect. The dependency model, in contrast, is based on a competing intuition, that features are important in people's conceptual representations to the extent they are responsible for other features (e.g., DNA is more important than the color of an animal's fur because so much depends on DNA). But despite the plausibility of this intuition, it is does not conform to subjects’ category membership judgments. Whereas the dependency model predicts that the causal status effect should be stronger with stronger causal links and more dependents, it was either weaker or unchanged. And, whereas the dependency model predicts that features’ weights should be unaffected by the introduction of additional causes, we found instead a Causal-Based Classification: A Review 29 weaker causal status effect when background causes were present. It is interesting to consider how classifiers’ default assumptions regarding the causal strengths and background causes might influence the causal status effect. Recently, Lu, Yuille, Liljeholm, Cheng, and Holyoak (2008) have proposed a model that explains certain causal learning results by assuming that people initially assume causal relationships to be sufficient (“strong” in their terms, i.e., the cause always produces the effect) and necessary (“sparse,” i.e., there are no other causes of the effect) (also see Lombrozo, 2007; Rehder & Milovanovic, 2007). On one hand, because we have seen how features are weighed equally when causal links are deterministic, a default assumption of strong causal relationships works against the causal status effect. On the other hand, that people apparently believe in many probabilistic causal relations (e.g., smoking only sometimes causes lung cancer) means they can override this default. When they do, Lu et al.’s second assumption—the presumed absence of background causes—will work to enhance the causal status effect. The generative perspective also explains why essentialized categories lead to an enhanced causal status effect: The presence of an essential features means that observed features should be generated with greater probability and, so long as causal links are probabilistic, near features (X) should be generated with relatively greater certainty than far ones (Z). Although Rehder and Kim (2009b) tested the power of an essential feature to generate a larger causal status effect, note that an underlying feature would produce that effect even if it was only highly diagnostic of, but not truly essential to, category membership so long as it was sufficient to increase the probability of the observed features. This prediction is important, because the question of whether real-world categories are essentialized is a controversial one. Although good evidence exists for the importance of underlying properties to category membership (Gelman, 2003; Keil, 1989; Rips, 1989), Hampton (1995) has demonstrated that even when biological categories’ socalled essential properties are unambiguously present (or absent), characteristic features continue to exert an influence on judgments of category membership (also see Braisby et al., 1996; Kalish, 1995; Malt, 1994; Malt & Johnson, 1992). My own suspicion is that although the unobserved properties of many categories are distinctly important to category membership, few may be truly essential (see Rehder, 2007, for discussion). But according to the generative model, all that is required is that the unobserved property Causal-Based Classification: A Review 30 increase the probability of the observed features. The causal status effect may be related to essentialism in two other ways. First, whereas I have described the present results in terms of an essential feature increasing the probability of observed features, it may also be that subjects engaged in a more explicit form of causal inference in which they reasoned from the presence of observable features X, Y, and Z to the presence of the unobserved essential feature (and from the essential feature to category membership). I consider this possibility further in Section 7. Second, Ahn (1998) and Ahn and Kim (2001) proposed that the causal status effect is itself a sort of incomplete or weakened version of essentialism. On this account, the root cause X in the causal chain in Figure 1 becomes more important because it is viewed as essence-like (a “proto-essence” if you will), although without an essence’s more extreme properties (i.e., always unobservable, a defining feature that appears in all category members, etc.). Of course, standing as evidence against this principle are the numerous conditions reviewed above in which a causal status effect failed to obtain. Moreover, the need for the principle would seem to be obviated by the finding that the causal status effect is fully explicable in terms of the properties of the category’s causal model, including (a) the strengths of the causal links and (b) the presence of unobserved (perhaps essential) features which are causally related to the observed ones. Another important empirical finding concerns how the changes to features' categorization importance brought about by causal knowledge is mediated by their subjective category validity (i.e., likelihood among category members). In every experimental condition in which classification ratings revealed a full causal status effect, participants also rated feature X as more likely than Y and Y as more likely than Z; whenever a causal status effect was absent, features’ likelihood ratings were not significantly different. Apparently, causal knowledge changes the perceived likelihood with which a feature is generated by a category's causal model and any feature that occurs with greater probability among category members (i.e., has greater category validity) should provide greater evidence in favor of category membership (Rosch & Mervis, 1975). Other studies have shown that a feature’s influence on categorization judgments correlates with its subjective category validity. For example, although Study 5 of Sloman et al. (1998) found that features’ judged mutability dissociated from their objective category Causal-Based Classification: A Review 31 validity, they tracked participants’ subjective judgments of category validity.10 In summary, what should be concluded about the causal status effect? On one hand, the causal status effect does not rise to the level of an unconditional general law, that is, one that holds regardless of the causal facts involved. Even controlling for the effects of contrast categories, empirical/statistical information, and the salience of individual features, in the 56 experimental conditions in the 15 studies reviewed in this chapter testing novel categories, a full causal status effect obtained in 26 of them, a partial effect (a higher weight on the root cause only, e.g., X > Y = Z) obtained in 7, and there was no effect (or the effect was reversed) in 23.11 On the other hand, these experiments also suggest that a causal status effect occurs under conditions that are not especially exotic—specifically, it is promoted by (a) probabilistic causal links, (b) the absence of alternative causes, (c) essentialized categories, and (d) nonfunctional effect features. Because these conditions are likely to often arise in real-world categories, the causal status effect is one of the main phenomenon in causal-based classification. 5. Relational Centrality and Multiple Cause Effects The causal status effect is one important way that causal knowledge changes the importance of individual features to category membership. However, there are many documented cases in which effect features are more important than their causes rather than the other way around. For example, Rehder and Hastie (2001) instructed subjects on the two networks in Figure 8. In the common-cause network, one category feature (X) is described as causing the three other features (Y1, Y2, and Y3). In the commoneffect network, one feature (Y) is described as being caused by each of the three others (X1, X2, and X3). (The causes were described as independent, noninteracting causes of Y.) No information about the strength of the causes or background causes was provided. After learning these causal structures, subjects were asked to rate the category membership of all 16 items that could be formed on the four binary dimensions. The regression weights on features averaged over Experiments 1–3 from Rehder and Hastie (2001) are presented in Figure 9. In the common cause condition, the common cause feature X was weighed much more heavily than the effects. That is, a strong causal status effect occurred. However, in the common effect condition the effect feature Y was weighed much more heavily than any of its causes. That is, a reverse causal status effect occurred. This pattern of feature weights for common cause and Causal-Based Classification: A Review 32 common effect networks has been found in other experiments (Ahn, 1999; Rehder, 2003a; Rehder & Kim, 2006; Ahn & Kim, 2001 12). Two explanations of this effect have been offered. The first, which I refer to as the relational centrality effect, states that features’ importance to category membership is a function of the number of causal relations it which it is involved (Rehder & Hastie, 2001). On this account, Y is most important in a common effect network because it is involved in three causal relations whereas its causes are involved in only one. The second explanation, the multiple cause effect, states that a feature’s importance increases as a function of its number of causes (Ahn & Kim, 2001; Rehder & Kim, 2006). On this account, Y is most important because it has three causes whereas the causes themselves have zero. Note that because neither of these accounts alone explains the causal status effect (e.g., feature X in Figure 1 has neither the greatest number of causes nor relations), the claim is that these principles operate in addition to the causal status effect rather than serving as alternative to it. I first review evidence for and against each of these effects and then discuss their implications for the dependency and generative models. 5.1. Evidence Against a Relational Centrality Effect and For a Multiple Cause Effect A study already reviewed in Section 4.4 provides evidence against the relational centrality effect. Recall that Rehder and Kim (2006, Experiment 3) tested the two causal networks shown in Figure 5. Feature Y has three effects in the 1-1-3 network but only one in the 1-1-1 network. The results showed that Y was not relatively more important in the 1-1-3 condition than in the 1-1-1 condition (Figure 6). These results were interpreted as evidence against the dependency model’s claim that features’ importance increases with their number of effects but they also speak against the claim that importance increases with their number of relations: Feature Y is involved in four causal relations in the 1-1-3 network but only two in the 1-1-1 network. Feature importance does not appear to generally increase with the number of relations. This result implies that the elevated weight on the common effect feature in Figure 9 must be due instead to it having multiple causes. Accordingly, Rehder and Kim (2006, Experiment 2) tested the multiple-cause effect by teaching subjects the two causal structures in Figure 10. Participants in both conditions were instructed on categories with five features, but whereas feature Y had three causes in the Causal-Based Classification: A Review 33 3-1-1 condition (3 root causes, 1 intermediate cause, 1 effect), it had only one in the 1-1-1 condition. In the 1-1-1 condition, which feature played the role of Y‘s cause was balanced between X1, X2, and X3 . After learning these causal category structures, subjects were asked to rate the category membership of all 32 items that could be formed on the five binary dimensions. The results of this experiment are presented in Figure 11. (In the figure, the weight for “X (cause)” is averaged over X1, X2 , and X3 in the 3-1-1 condition and is for the single causally related X in the 1-1-1 condition. The weight for “X (isolated)” is for the isolated Xs in the 1-1-1 condition.) Figure 11 confirms the presence of a multiple-cause effect: Feature Y was weighed relatively more in the 3-1-1 condition when it had three causes versus the 1-1-1 condition when it only had one. These results show that a feature’s number of causes influences the weight it has on category membership judgments 13. 5.2. Evidence For an Isolated Feature Effect Although a feature’s weight does not generally increase with its number of relations, there is substantial evidence showing that features involved in at least one causal relation are more important than those involved in zero (so-called isolated features). For example, in Rehder and Kim’s (2006), Experiment 2 just discussed, weights on the isolated features in the 1-1-1 condition (X1 and X3 in Figure 10) were lower than on any of the causally related features (Figure 11). Likewise, in Rehder and Kim’s Experiment 3, weights on the isolated features in the 1-1-1 condition (Z1 and Z3 in Figure 5) were lower than on any of the causally related features (Figure 6) (also see Kim & Ahn, 2002a). Of course, one might attribute this result to the fact that causally related features were mentioned more often during initial category learning and this repetition may have resulted in those features being treated as more important. However, even when Kim and Ahn (2002b) controlled for this by relating the isolated features to each other via non-causal relations, they were still less important than the causally linked features. That features involved in at least one causal relation are more important than isolated features will be referred to as the isolated feature effect. 5.3. Theoretical Implications: Discussion What implications do these findings have for the dependency and generative models? First, the multiple-cause effect provides additional support for the generative model and against the dependency Causal-Based Classification: A Review 34 model. Because the dependency model predicts that feature importance varies with the number of dependents, it predicts no difference between the 3-1-1 and 1-1-1 conditions of Rehder and Kim (2006). In contrast, it is straightforward to show that the generative model generally predicts a multiple cause effect. Because demonstrating this quantitatively for the networks in Figures 8 and 10 is cumbersome (it requires specifying likelihood equations for 16 and 32 separate items, respectively), I do so for a simpler three-feature common effect network, one in which features X1 and X2 each independently cause Y. The likelihood equations generated for each item for this simplified network are presented in Table 5 by iteratively applying Equation 2. For comparison the table also specifies the likelihood equations for a simplified three-feature common cause structure (X causes Y1 and Y2). The table also presents the probability of each item for a sample set of parameter values, namely, cX =.75, mXY1 = mXY2 = .75, and bY1 = bY2 =.10 for the common cause network and cX1 = cX2 =.75, mX1Y = mX2Y = .75, and bY = .10 for the common effect network. From these item distributions, the probability of features of individual features can be computed. (The predicted feature interactions for these networks, also presented in the table, will be discussed in Section 6.) First note that, for the common cause network, the generative model predicts a larger weight on the common cause than the effect features. Second, for the common effect network, it predicts a weight on the common effect feature, pK(Y) = .828, which is greater than its causes (pK(Xi) = .750) or the Ys in the common cause network which each have only one cause (pK(Yi) = .606). This prediction of the generative model corresponds to the simple intuition that an event will be more likely to the extent that it has many versus few causes.14 These predictions for higher weights on a common cause and a common effect reproduce the empirical results in Figure 9. Other research has also shown that an intuitive judgments of an event’s probability will be greater when multiple potential causes of that event are known. For example, according to Tversky and Koehler’s (1994) support theory, the subjective probability of an event increases when supporting evidence is enumerated (death due to cancer, heart disease, or some other natural cause) rather than summarized (death due to natural causes) (also see Fischoff, Slovic, & Lichtenstein, 1978). And, Rehder and Milovanovic (2007) found that an event was rated as more probable as its number of causes increased (from 1 to 2 to 3). Note that Ahn and Kim (2001) also proposed that the multiple cause effect obtained Causal-Based Classification: A Review 35 with common effect networks was due to the greater subjective category validity associated with common effect features.15 However, whereas the multiple cause effect provides additional support for the generative model, the isolated feature effect is problematic for both the generative and the dependency models. For example, for the 3-1-1 network in Figure 10, the generative model stipulates one free c parameter for each X and, in the absence of any other information, those cs should be equal. Thus, it predicts that X1 and X3 should have the same weight as X2. Because they have the same number of dependents (zero), the dependency model predicts that X1 and X3 should have the same weight as Z. Why should features be more important because they are involved in one causal relation? Ahn and Kim (2001) have proposed that this effect is related to Gentner’s (1989) structure mapping theory in which statements that relate two or more concepts (e.g., the earth revolves around the sun, represented as revolves-around (earth, sun)) are more important in analogical mapping than statements involving a single argument (e.g., hot(sun)). Of course, the primary result to be explained is not the importance of predicates (e.g., revolves-around and hot) but rather the importance of features (that play the role of arguments in predicates, e.g., causes(X, Y)). But whatever the reason, the isolated feature effect joins the causal status and multiple cause effects as an important way that causal knowledge influences classification judgments. 6. The Coherence Effect The next phenomenon addressed is the coherence effect. Whereas the causal status and multiple cause effects (and the isolated feature effect) involve the weights of individual features, the coherence effect reflects an interaction between features. The claim is that better category members are those whose features appear in sensible or coherent combinations. For example, if two features are causally related, then one expects the cause and effect feature to usually be both present or both absent. In fact, large coherence effects have been found in every study in which they’ve been assessed (Rehder & Hastie, 2001; Rehder 2003a; b; Rehder & Kim, 2006; 2009b; Rehder 2007; 2009b; Marsh & Ahn, 2006). It is important to recognize that effects of causal knowledge on feature weights and feature interactions are not mutually exclusive. As reviewed in Section 2.2, weights and interactions represent Causal-Based Classification: A Review 36 two orthogonal effects (corresponding to “main effects” and “interactions” in analysis of variance). Indeed, some of the studies reviewed below demonstrating coherence effects are the same ones reviewed in Sections 4 and 5 showing feature weight effects. In other words, causal knowledge changes the importance of both features and combinations of features to categorization. In the subsections that follow I review studies testing the generative model’s predictions regarding how coherence effects are influenced by changes in model parameters (e.g., the strengths of causal links) and the topology of the causal network. The first two studies demonstrate effects manifested in terms of two-way interactions between features; the third one also demonstrates an effect on higherorder interactions between features. Recall that the dependency model predicts an effect of causal knowledge on feature weights but not feature interactions, and so is unable to account for coherence effects. 6.1. Causal Link Strength Recall that, according to the generative model, when features are causally related one expects those features to be correlated. For example, for the three-element chain network (Figure 1), one expects the two directly causally related feature pairs (X and Y, and Y and Z) to be correlated for most parameter values, and for the indirectly related pair (X and Z) to be more weakly correlated. Moreover, one expects these correlations to be influenced by the strength of the causal relations. Table 3 shows the generative model’s predictions for different causal strengths holding the b parameters fixed at .10. Note two things. First, Table 3 indicates that the magnitude of the probabilistic contrasts between features increase as mXY and mYZ increase. The contrasts between directly related features are .300, .675, .810, and .900 for causal strengths of .33, .75, .90 and 1.0 (Table 3); the contrast between the indirectly related features are .090, .456, .656, and .810. Intuitively, it makes this prediction because features that are more strongly causally related should be more strongly correlated. Second, Table 3 indicates that the difference between the direct and indirect contrasts changes systematically with strength. It makes this prediction because, although the correlation between directly related pairs should be stronger than between the indirectly related one for many parameter values, this difference will decrease as mXY and mYZ approach 1 or 0. For example, when the ms = 1 (and there are no Causal-Based Classification: A Review 37 background causes), features X, Y, and Z are all perfectly correlated (e.g., Y is present if and only if Z is and vice versa) and thus there is no difference between direct and indirect contrasts. Likewise, when the ms = 0, features X, Y, and Z should be perfectly uncorrelated and thus there is again no difference between direct and indirect contrasts. In other words, the generative model predicts that the direct/indirect difference should be a nonmonotonic function of causal strength. These predictions were tested in Experiment 1 of Rehder and Kim (2009b) described earlier in which the strengths of the causal links were varied between either 100% or 75%. Subjects’ sensitivity to correlations between features was assessed via regression analyses that included two-way interactions terms for each pair of features. Regression weights on those interaction terms reflect the sensitivity of classification ratings to whether each pair of features is both present or both absent versus one present versus the other absent. The two-way interaction terms are presented in the right panel of Figure 2A. In the figure, the weights on the two directly related features (X and Y, and Y and Z) have been collapsed together and are compared against the single indirectly related pair (X and Z). The first thing to note is that in both the Chain-100 and the Chain-75 condition both sort of interaction terms were significantly greater than zero. This reflects the fact that subjects granted items higher category membership ratings to the extent they were coherent in light of a category’s causal laws (e.g., whether both cause and effect features were both present and both absent). As expected, in the Control condition both sorts of the two-way interactions terms (not shown in Figure 2A) were close to zero. Moreover, the generative model also correctly predicts how the magnitude of the interaction weights varied over condition (Chain-100 and Chain-75) and type (direct and indirect). First, it was predicted that the magnitude of the interactions terms should be greater in the Chain-100 versus the Chain-75 condition. Second, it was predicted that the difference between the direct and indirect terms should be small or absent in the Chain-100 condition and larger in the Chain-75 condition. In fact, this is exactly the pattern presented in Figure 2A.16 Causal link strength is one important factor that determines not only the size of coherence effects but also more subtle aspects of that effect (e.g., the difference between the direct and indirect terms). Causal-Based Classification: A Review 38 The effect of coherence in this experiment can also be observed directly in the classification ratings of individual test items. Figure 12A presents the test item classification ratings as a function of their number of characteristic features. As expected, in the Control condition ratings were a simple monotonic function of the number of features. In contrast, items with 2 or 1 features were rated lower than those with 3 or 0 (i.e., items 111 and 000) in both causal conditions; moreover, this effect was more pronounced in the Chain-100 condition than the Chain-75 condition. Intuitively, the explanation for these differences is simple. When Control participants are told, for example, that "most" myastars are very hot, have high density, and have a large number of planets, they expect that most myastars will have most of those features and that the atypical values exhibited by "some" myastars (unusually cool temperature, low density, and small number of planets) will be spread randomly among category members. That is, they expect the category to exhibit a normal family resemblance structure in which features are independent (i.e., are uncorrelated within the category). But when those features are causally related, the prototype 111 and item 000 receive the highest ratings. Apparently, rather than expecting a family resemblance structure with uncorrelated features, participants expected the "most" dimension values to cluster together (111) and the "some" values to cluster together (000), because that distribution of features is most sensible in light of the causal relations that link them. As a result, the rating of test item 000 is an average 30 points higher in the causal conditions than in the Control condition. In contrast, items that are incoherent because they have 1 or 2 characteristic features (and thus have a mixture of "most" and "some" values) are rated 29 points lower than in the Control condition. 6.2. Background Causes Experiment 2 of Rehder and Kim (2009b) described earlier also tested how coherence effects vary with the strength of background causes. Table 3 shows the generative model’s predictions for different background strengths holding causal strengths (mXY and mYZ) fixed at .75. Note that Table 3 indicates that the magnitude of the probabilistic contrasts between features decrease as bY and bZ increase. The contrasts between directly related features are .75, .56, and .38 for values of bY and bZ of 0, .25, and .50, respectively; the contrasts between the indirectly related features are .56, .32, and .14. Intuitively, it makes this prediction because a cause becomes less strongly correlated with its effect to the extent that Causal-Based Classification: A Review 39 the effect has alternative causes. In addition, the generative model once again predicts that the direct correlations should be stronger than the indirect one. Recall that Rehder and Kim (2009b, Experiment 2) directly manipulated background causes by describing the strength of those causes as either 0% (Background-0 condition) or 50% (Background-50 condition). Once again, the weights on two-way interaction terms derived from regression analyses performed on the classification ratings represent the magnitude of the coherence effect in this experiment. The results, presented in the right panel of Figure 2B, confirm the predictions: The interaction terms were larger in the Background-0 condition than the Background-50 condition and the direct terms were larger than the indirect terms. Once again, the strong effect of coherence is apparent in the pattern of test item ratings shown in Figure 12B. (The figure includes ratings from the Control condition from Rehder and Kim’s Experiment 1 for comparison.) Whereas in the Control condition ratings are a monotonic function of the number of characteristic features, in the causal conditions incoherent items with 2 or 1 features are rated lower relative to the Control condition and the coherent item 000 is rated higher. Apparently, participants expected category members to reflect the correlations that the causal relations generate: The causallylinked characteristic features should be more likely to appear together in one category member and atypical features should be more likely to appear together in another. Moreover, Figure 12B shows that this effect was more pronounced in the Background-0 condition than the Background-50 condition. The strength of background causes is a second important factor that determines the size of coherence effects. 6.3. Higher-Order Effects In this section I demonstrate how the generative model predicts not only two-way but also higher order interactions among features. Consider again the common cause and common effect networks in Figure 8. Of course, both networks imply that those feature pairs that are directly related by a causal relation will be correlated. In addition, as in a chain network, the common-cause network implies that the indirectly related features (the effects Y1, Y2, and Y3) should be correlated for most parameter values (i.e., so long as cX < 1, the ms > 0, and the bs < 1) albeit not as strongly as the directly related features (so long as the ms < 1). The expected correlations between the effects is easy to understand given the inferential Causal-Based Classification: A Review 40 potency that exists among these features. For example, if in some object you know only about the presence of Y1 , you can reason from Y1 to the presence of X and then from X to the presence of Y2. This pattern of correlations is exhibited in Table 5 by the simplified three-feature common cause network: Contrasts of .675 between the directly related features and .358 between the indirectly related ones. In contrast, the common effect network implies a different pattern of feature interactions. In an disanalogy with a common cause network, the common effect network does not predict any correlations among the independent causes of Y because they are just that (independent). If in some object you know only about the presence of X1, you can reason from X1 to Y but then the (inferred) presence of Y does not license an inference to X2 . However, unlike the common cause network, the common effect network implies higher order interactions among features, namely, between each pair of causes and the common effect Y. The following example provides the intuition behind these interactions. When the common effect feature Y is present in an object, that object will of course be a better category member if a cause feature (e.g., X1) is also present. However, the presence of X1 will be less important when another cause (e.g., X2) is already present to “explain” the presence of Y. In other words, a common-effect network’s higher-order interactions reflect the diminishing marginal returns associated with explaining an effect that is already explained. This pattern of correlations is exhibited in Table 5 by the simplified three-element common effect network: Contrasts of .295 between the directly related features, 0 between the indirectly related effects, and higher order contrasts between X1, X2, and Y: $pk (X1, Y, X2) = $pk (X2, Y, X1) = – .174. These higher-order contrasts reflect the normative behavior of discounting for the case of multiple sufficient causation during causal attribution (Morris & Larrick, 1995). In contrast, Table 5 shows that the analogous higher order contrasts for the common cause network, $pk (Y1, X, Y2) and $pk (Y2, X, Y1), are each 0, reflecting the absence of discounting for that network. To test these whether these expected higher-order interactions would have the predicted effect on classification judgments, participants in Rehder (2003a) learned categories with four features and three causal links arranged in either the common cause or common effect networks shown in Figure 8. No information about the strength of the causes or background causes was provided. To provide a more sensitive test of the effect of feature interactions, this study differed from Rehder and Hastie (2001) by Causal-Based Classification: A Review 41 omitting any information about which features were individually typical of the category. Subjects then rated the category membership of all 16 items that can be formed on 4 binary dimensions. A control condition with no causal links was also tested. The regression weights from this experiment are presented in Figure 13. Note that the feature weights shown in Figure 13A replicate the feature weights found in Rehder and Hastie (2001): Higher weights on the common cause and the common effect (Figure 9). More importantly, the feature interactions show in Figure 13B reflect the pattern of interfeature correlations just described. (In Figure 13B, for the common cause network the “direct” two-way interactions are between X and its effect and the indirect ones are between the effects themselves; for the common effect network, the “direct” interactions are between Y and its causes and the indirect ones are between the causes.) First, in both conditions the two-way interaction terms corresponding to directly causally related feature pairs had positive weights. Second, in the common-cause condition the indirect two-way interactions were greater than zero and smaller than the direct interactions, consistent with expectation that the effects will be correlated with one another but not as strongly as with the cause. Third, in the common effect network the indirect terms were not significantly greater than zero, consistent with the absence of correlations between the causes. Finally, that the average of the three three-way interactions involving Y (i.e., fX1X2Y, fX1X3Y, fX2X3Y) was significantly negative in the common effect condition reflects the higher-order interactions that that structure is expected to generate (Table 5). This interaction is also depicted in the right panel of Figure 13C that presents the logarithm of categorization ratings in the common-effect condition for those exemplars in which the common effect is present as a function of the number of cause features. The figure shows the predicted nonlinear increase in category membership ratings as the number of cause features present to “explain” the common effect feature increases. In contrast, for the common cause network (Figure 13C, left panel), ratings increased linearly (in log units) with the number of additional effects present. (All two-way and higher-order interactions were close to zero in the control condition.) These results indicate that, consistent with the predictions of the generative model, subjects expect good category members to manifest the two-way and higher-order correlations that causal laws generate. Causal-Based Classification: A Review 42 6.4. Other Factors Finally, just as was the case for the causal status effect, questions have been raised about the robustness of the coherence effect. Note that in early demonstrations of this effect, one value on each feature dimension was described as characteristic or typical of the category whereas the other, atypical value was often described as "normal" (Rehder & Hastie, 2001; Rehder 2003a; b; Rehder and Kim, 2006). For example, in Rehder and Kim (2006) participants were told that "Most myastars have high temperature whereas some have a normal temperature," "Most myastars have high density whereas some have a normal density," and so on. Although the intent was to define myastars with respect to the superordinate category (all stars), Marsh and Ahn (2006) suggested that this use of "normal" might have inflated coherence effects because participants might expect all the normal dimension values to appear together and, because of the emphasis on coherence, reduced the causal status effect. To assess this hypothesis, Marsh and Ahn taught participants categories with four features connected in a number of different network topologies. For each, they compared an Unambiguous condition in which the uncharacteristic value on each binary dimension was the opposite of the characteristic value (e.g., low density vs. high density) with an Ambiguous condition (intended to be replications of conditions from Rehder 2003a and 2003b) in which uncharacteristic values were described as "normal" (e.g., normal density). They found that the Unambiguous condition yielded a larger causal status effect and a smaller coherence effect, a result they interpreted as demonstrating that the "normal" wording exaggerates coherence effects. However, this conclusion was unwarranted because the two conditions also differed on another dimension, namely, only the Unambiguous participants were given information about which features were typical of the category. In the absence of such information it is unsurprising that ratings in the Ambiguous condition were dominated by coherence. To determine whether use of “normal” affects classification, Rehder and Kim (2008) tested categories with four features arranged in a causal chain (W!X!Y!Z) and compared two conditions that were identical except that one used the "normal" wording and the other used bipolar dimensions (e.g., low vs. high density). The results, presented in Figure 14, show a pattern of results exactly the opposite of the Marsh and Ahn conjecture: The "normal" wording produced a smaller coherence effect and a larger Causal-Based Classification: A Review 43 causal status effect. Note that large coherence effects were also found in each of the four experiments from Rehder and Kim (2009b) reviewed above that also avoided use of the “normal” wording. Why might bipolar dimensions lead to stronger coherence effects? One possibility is that participants might infer the existence of additional causal links. For example, if you are told that myastars have either high or low temperature and either high or low density, and that high temperature causes high density, you might take this to mean that low temperature also causes low density. These results suggest that subtle differences in the wording of a causal relation can have big effects in how those links are encoded and then used in a subsequent reasoning task. But however one interprets them, these findings indicate that coherence effects do not depend on use of the “normal” wording. 6.5. Theoretical Implications: Discussion Causal networks imply the existence of subtle patterns of correlations between variables: Directly related variables should be correlated, indirectly related variables should be correlated under specific conditions, and certain networks imply higher-order interactions among variables. The studies just reviewed show that people’s classification judgments are exquisitely sensitive to those correlations. These results provide strong support the generative model’s claim that good category members are those that manifest the expected pattern of interfeature correlations and poor category members are those that violate that pattern. As mentioned, the presence of coherence effects supports the generative model over the dependency model because only the former model predicts feature interactions. However, one model that predicts feature interactions is Rehder and Murphy’s (2003) KRES recurrent connectionist model that represents relations between category features as excitatory and inhibitory links (also see Harris & Rehder, 2006). KRES predicts interactions because features that are consistent with each other in light of knowledge will raise each other’s activation level (due to the excitatory links between them) which in turn will activate a category label more strongly; inconsistent features will inhibit one another (due to the inhibitory links between them) which will result in a less active category label. But while KRES accounts for a number of known effects of knowledge on category learning, because its excitatory and inhibitory links are symmetric, KRES is fundamentally unable to account for the effects reviewed above Causal-Based Classification: A Review 44 demonstrating that subjects treat causal links as an asymmetric relation. For example, if one ignores causal direction, X and Z in the causal chain in Figure 1 are indistinguishable (and thus there is no basis for predicting a causal status effect) as are the common cause and common effect networks in Figure 9 (and thus there is no basis for predicting the different pattern of feature interactions for those networks). Indeed, the asymmetry between common-cause and common-effect networks has been the focus of considerable investigation in both the philosophical and psychological literatures (Reichenbach, 1956; Salmon, 1984; Waldmann & Holyoak, 1992; Waldmann et al., 1995). The importance of coherence to classification has been documented by numerous other studies. For example, Wisniewski (1995) found that certain artifacts were better examples of the category “captures animals” when they possessed certain combinations of features (e.g., “contains peanuts” and “caught a squirrel”) but not others (“contains acorns” and “caught an elephant”) (also see Murphy & Wisniewski, 1989). Similarly, Rehder and Ross (2001) showed that artifacts were considered better examples of a category of pollution cleaning devices when their features cohered (e.g., “has a metal pole with a sharpened end” and “works to gather discarded paper”), and worse examples when their features were incoherent (“has a magnet” and “removes mosquitoes”). Malt and Smith (1984) found that judgments of typicality in natural categories were sensitive to whether items obeyed or violated theoretically-expected correlations (also see Ahn et al., 2002). Coherence also affects other types of category-related judgments. Rehder and Hastie (2004) found that participants’ willingness to generalize a novel property displayed by an exemplar to an entire category varied as a function of the exemplar’s coherence. Patalano and Ross (2007) found that the generalization strength of a novel property from some category members to another varied as a function of the category’s overall coherence (and found the reverse pattern when the generalization was made to a non-category member). Finally, it is illuminating to compare the relative importance of the effects of causal knowledge on features weights (i.e., the causal status, multiple cause, and relational centrality effects) and feature interactions (the coherence effect) by comparing the proportion of the variance in categorization ratings attributable to the two types of effects. In this calculation, the total variance induced by causal knowledge was taken to be the additional variance explained by a regression model with separate predictors for each Causal-Based Classification: A Review 45 feature and each two-way and higher-order interaction as compared to a model with only one predictor representing the total number of characteristic features in a test item. The variance attributable to the changes in feature weights is the additional variance explained by the separate predictors for each feature in the full model, whereas that attributable to the coherence effect is the additional variance explained by the interaction terms. In fact, coherence accounts for more variance in categorization judgments than feature weights in every study in which coherence has been assessed: 60% in Rehder and Hastie (2001, Experiment 2), 80% in Rehder (2003a), 82% in Rehder (2003b, Experiment 1), 70% in Rehder and Kim (2006), 64% in Marsh and Ahn (2006), and over 90% in Rehder and Kim (2009b). These analyses indicate that the most important factor that categorizers consider when using causal laws to classify is whether an object displays a configuration of features that make sense in light of those laws. 7. Classification as Explicit Causal Reasoning The final phenomenon I discuss concerns evidence of how categorization can sometimes be an act of causal reasoning. On this account, classifiers treat the features of an object as evidence for the presence of unobserved features and these inferred features then contribute to a category membership decision. Causal reasoning such as this may have contributed to instances of the causal status effect described in Section 4. For example, recall that Rehder and Kim (2009b, Experiment 3) found an enhanced causal status effect when subjects were instructed on categories with an explicit “essential” feature (Figure 4A). Although we interpreted those findings in terms of how the essential feature changed the likelihoods of the observed features (and provided evidence for this claim, see Figure 5C), subjects may have also reasoned backwards from the observed features to the essential one, and, of course, features closer (in a causal sense) to the essence (e.g., X) were taken to be more diagnostic of the essence than far ones (e.g., Z). Reasoning of this sort may occur even when participants are not explicitly instructed on an essential feature. For example, one of the categories used in Ahn et al. (2000a) was a disease that was described as having three symptoms X, Y, and Z. Although participants were told that X!Y!Z, people understand that a disease (D) causes its symptoms, and so participants were likely to have assumed the more complex causal model D!X!Y!Z (and then reasoned backwards from the Causal-Based Classification: A Review 46 symptoms to the disease). Given the prevalence of essentialist intuitions (Gelman, 2003), similar reasoning may have occurred for the natural kinds and artifacts tested in many studies. I now review recent studies that provide more direct evidence of causal reasoning during classification. 7.1. Classification as Diagnostic (Backward) Reasoning Rehder and Kim (2009a, Experiment 1) investigated the causal inferences people make in the service of classification by teaching subjects the causal structures in Figure 15A. Unlike the studies reviewed above, subjects were taught two novel categories (e.g., myastars and terrastars). Category A had three features, one underlying feature (UA) and two observable features (A1 and A2). The first observable feature (A1) was described as being caused by UA but the second (A2) was not. Likewise, category B had one underlying feature (UB) that caused the second observable feature (B2) but not the first (B1). Like the pseudo-essential feature in Rehder and Kim (2009b, Experiment 3) in Figure 4A, UA and UB were defining because they were described as occurring in all members of their respective category and no nonmembers. Observable features were associated with their category by stating that they occurred in 75% of category members. After learning about the two categories, participants were presented with test items consisting of two features, one from each category, and asked which category the item belonged to. For example, a test item might have features A1 and B1, which we predicted would be classified as an A, because from A1 one can reason to UA via the causal link that connects them, but one cannot so reason from B1 to UB. For a similar reason, an item with features A2 and B2 should be classified a B. Consistent with this prediction, subjects chose the category whose underlying feature was implicated by the observables ones 84% of the time. Moreover, when subjects were presented with items in which the presence of two features was negated (e.g., A1 and B1 both absent), they chose the category whose underlying feature could be inferred as absent (e.g., category A) only 32% of the time. That is, subjects appeared to reason from observable features to underlying ones and then category membership. There are alternative interpretations of these results however. In Figure 15A, feature A1 might have been viewed as more important because it was involved in one relation versus B1 which was involved in zero (i.e., an isolated features effect; see Section 5.2). To address this concern, Rehder and Causal-Based Classification: A Review 47 Kim (2009a) conducted a number of follow-up experiments. Our Experiment 3 tested the categories in Figure 15B in which the strength of the causal relations between UA and A1, and UA and A2, was described as 90% and 60%, respectively, whereas those between UB and B1, and UB and B2, was described as 60% and 90%. We predicted that test item A1B1 would be classified as an A, because the inference from A1 to UA is more certain than the inference from B1 to UB. Consistent with this prediction, subjects classified test item A1B1 as an A 88% of the time. Experiment 4 tested the categories in Figure 15C. Whereas UA was described as occurring in all category A members just as in the previous Experiments 1-3, UB was described as occurring in only 75% of category B members. We predicted that whereas the observable features of both categories provide equal evidence for UA and UB, respectively, those of category A should be more diagnostic because UA itself is. Consistent with this prediction, test item A1B1 was classified as an A 68% of the time. Finally, Experiment 5 tested the category structures in Figure 15D. Unlike the previous experiments, participants were given explicit information about the possibility of alternative causes of the observable features; specifically, they were told that features A1 and B2 had alternative causes (that operated with probability 50%) whereas A2 and B1 had none. We predicted test items A1B1 should be classified as a B because B1 provides decisive evidence of UB (because it has no other causes). As predicted, test item A1B1 was classified a B 73% of the time. Importantly, these results obtained despite the fact that the more diagnostic feature was involved in either the same number (Experiments 3 and 4) or fewer (Experiment 5) causal relations. Recent evidence suggests that people also reason diagnostically to underlying properties for natural categories. In a replication of Rips’s (1989) well known transformation experiments, Hampton et al. (2007) found that whether a transformed animal (e.g., a bird that comes to look like an insect due to exposure to hazardous chemicals) was judged to have changed category membership often depended on what participants inferred about underlying causal processes and structures. As in Rips’s study, a (small) majority of subjects in Hampton et al. judged the transformed animal to still be a bird whereas a (large) minority judged that it was now an insect. But although the judgments of the latter group (dubbed the phenomenalists by Hampton et al.) would seem to be based on the animals’ appearance, the justifications they provided for their choices indicated instead that many used the animals’ new properties to infer Causal-Based Classification: A Review 48 deeper changes. For example, subjects assumed that a giraffe that lost its long neck also exhibited new behaviors that were driven by internal changes (e.g., to its nervous system) which in turn signaled a change in category membership (to a camel). Conversely, those subjects who judged that the transformed animal’s category was unchanged (the essentialists) often appealed to the fact that it produced offspring from its original category, from which they inferred the absence of important internal changes (e.g., to the animal’s DNA). In other words, rather than the (so-called) phenomenalists using only observable features, and rather than the essentialists just relying on the presence of previously-inferred underlying properties, both groups used observable features to infer the state of internal causal structures and processes, and decided category membership on that basis. Finally, also recall Murphy and Medin’s (1985) well-known example of classifying a party-goer who jumps into a pool as drunk—one reasons from aberrant behavior to its underlying cause even if one has never before observed a swimming drunk. 7.2. Classification as Prospective (Forward) Reasoning The notion of explicit causal reasoning in the service of classification allows for not only backwards, or diagnostic, reasoning to underlying features but also forwards, or prospective, reasoning. For example, a physician may suspect the presence of HIV given the presence of the forms of sarcoma, lymphoma, and pneumonia that HIV is know to produce (diagnostic reasoning). But the case for HIV is made stronger still by the presence of one or more of its known causes, such as blood transfusions, sharing of intravenous needles, or unsafe sex (prospective reasoning). I now review evidence of prospective reasoning in classification. 7.2.1. Rehder (2007). Subjects were taught the common cause and common effect networks in Figure 8, but now the common cause and common effect were pseudo-essential underlying features, that is, they occurred in all category members and no nonmembers. The classification test only included items with three observable features (the effect features in the common cause network or the cause features in the common effect network). Rehder (2007) showed that an object's degree of category membership increased nonlinearly with its number of observable features when those features were effects as compared to the linear increase that obtained when those features were causes, results consistent with a Causal-Based Classification: A Review 49 normative account of causal reasoning (also see Oppenheimer & Tenenbaum, 2009). 7.2.2. Follow-up to Rehder & Kim (2009a). In a follow-up experiment to Rehder and Kim (2009a), our lab taught subjects that two category structures in Figure 16A. Participants were again presented with test items consisting of one feature from each category (e.g., A1 B1). Item A1B1 was classified as an A, suggesting that the evidence that A1 provided for category A via forward causal reasoning was stronger than the evidence that B1 provided for category B. This result is consistent with a well-known property of causal reasoning, namely, the fact that people reason more confidently from causes to effects than vice versa (Tversky & Kahneman, 1980). 7.2.3. Chaigneau, Barsalou, & Sloman (2004). Chaigneau et al. provided particularly compelling evidence of the presence of prospective causal reasoning in categorization. Figure 16B presents the causal structures they hypothesized constitute the mental representation of artifact categories. The function historically intended by the artifact’s designer results in its physical structure. In addition, the goal of a particular agent leads to the agent acting toward the artifact in a way to achieve those goals. Together, the artifact’s physical structure and the agent’s action yield a particular outcome. The authors presented subjects with a number of vignettes that each specified the state of the four causes in which three of the causes were present and the fourth was absent. For example, a vignette might include (a) an object that was created with the intention of being a mop but (b) was made out of plastic bags, and (c) an agent that wanted to clean up a spill and that (d) used the object in the appropriate way for mopping (i.e., all causes normal for a mop were present except physical structure). Subjects were asked how appropriate it was to call the object a mop. Classification ratings were much lower for vignettes in which the appropriate physical structure was missing as when the appropriate historical intention was missing. This result is consistent with subjects reasoning from an object’s physical structure to its potential function and then to category membership—so long as the structure of the artifact is appropriate, the intention of its designer becomes irrelevant.17 However, when physical structure is unspecified, then one can use the designer’s intentions to infer physical structure (and from structure infer potential function). This is what Chaigneau et al. found: The effect of a missing intention on classification was much larger when the physical structure was unspecified. Causal-Based Classification: A Review 50 7.4. Theoretical Implications: Discussion What implications do these findings have for the generative and dependency models? First, there are two ways that the generative model can account for the observed results. The first approach corresponds to the explicit causal reasoning account we have just described. As a type of a causal graphical model, a category’s network of interfeature causal links support the elementary causal inferences required to account for the results in Experiments 1–5. Indeed, Rehder and Burnett (2005) confirmed that people are more likely to infer the presence of a cause feature when it’s effect was present (and vice versa). Although Rehder and Burnett also observed some discrepancies from normative reasoning, current evidence indicates that people can readily engage in the causal reasoning from observed to unobserved features suggested by these experiments (also see Ahn et al., 2000a, Experiment 5; Sloman & Lagnado, 2005; Waldmann & Hagmayer, 2005). In addition, recall that the generative model also predicts that observed features caused by underlying properties are likely to be perceived as more prevalent among category members. Not only does this multiple cause effect help explain the enhanced causal status effect found with Rehder and Kim’s (2009b, Experiment 3) essentialized categories, Rehder and Kim (2009a) have shown how it explains the results from all five of their experiments described above. However, it does not explain the cases of prospective causal reasoning. For example, in Figure 16A feature B1 should have greater category validity in category B than A1 has in category A but A1 was the more diagnostic feature. Thus, demonstrations of prospective reasoning are important insofar as they establish the presence of classification effects not mediated by changes in feature likelihoods brought about by the multiple cause effect, implying a more explicit form of causal reasoning. Second, the dependency model in turn can explain some aspects of the prospective reasoning results, as features should be weighed more heavily when they have an extra effect. Thus, it explains why feature A1 is more diagnostic than B1 in Figure 16A. However, the dependency model fails to explain the results from Chaigneau et al. (2004) in which the importance of a distal cause (the intentions of an artifact’s designer) itself interacts with whether information about the artifact’s physical structure is available. Of course, the dependency model is also unable to account for the cases of diagnostic reasoning Causal-Based Classification: A Review 51 in which features become more important to the extent they have additional causes rather than effects. 8. Developmental Studies Given the robust causal-based classification effects in adults just reviewed, it is unsurprising that researchers have asked when these effects develops in children. I review evidence of how causal knowledge changes the importance of features and feature interactions and evidence of explicit causal reasoning in children. 8.1. Feature Weights and Interactions Initial studies of how causal knowledge affects children’s classification were designed to test whether children exhibit a causal status effect. For example, Ahn, Gelman, Amsterlaw, Hohenstein and Kalish (2000b) taught 7- to 9-year-olds a novel category with three features in which one feature was the cause of the other two. They told children, for example, that a fictitious animal called taliboos had promicin in their nerves, thick bones, and large eyes, and that the thick bones and large eyes were caused by the promicin. Ahn et al. found that an animal missing only the cause feature (promicin) was chosen to be a more likely category member than one missing only one of the effect features (thick bones or large eyes). Using a related design, Meunier and Cordier (2009) found a similar effect with 5-year-olds (albeit only when the cause was an internal rather than a surface feature). However, although the authors of these studies interpreted their findings as indicating a causal status effect, I have shown in Section 2.2 how these results can be interpreted as reflecting a coherence effect instead (see Example 1 in Table 1). An item missing only the cause feature (promicin) may have been rated a poor category member because it violated two expected correlations (one with thick bones, the other with large eyes) whereas an item missing only one effect feature violated only one expected correlation (the one with promicin). Thus, the Ahn et al. and Meunier and Cordier results are ambiguous regarding whether children exhibit a causal status effect or a coherence effect (or both). To test for the independent presence of these effects, Hayes and Rehder (2009) taught 5-6 year olds a novel category with four features, two of which were causally related. For example, children were told about a novel animal named rogos that have big lungs, can stay underwater for a long time, have long ears, and sleep during the day. They were also told that having big lungs was the cause of staying Causal-Based Classification: A Review 52 underwater for a long time (the other two features were isolated, i.e., were involved in no causal links). After category learning, subjects were presented with a series of trials presenting two animals and asked which was more likely to be a rogo. The seven test pairs are presented in Table 6. For each alternative, dimension 1 is the cause, dimension 2 is the effect, and dimensions 3 and 4 are the neutral features; ‘1’ means a feature is present, ‘0’ means it is absent, ‘x’ means there was no information about the feature (e.g., item 10xx is an animal with big lungs that can’t stay underwater very long, with no information about the two neutral features). A group of adults was also tested on this task. Table 6 presents the proportion of times alternative X was chosen in each test pair for the two groups of subjects. To analyse these data, we performed logistic regression according to the equation choicek (X, Y) = 1 / (1 + exp(–[diffk(X, Y)])) (8) where diffk is defined as the difference in the evidence that alternative X and Y provide for category k, diffk (X, Y) = evidencek(X) – evidencek(Y) = (wc fX,1 + we fX,2 + wn fX,3 + wn fX,4 + whhX) – (wc fY,1 + we fY,2 + wn fY,3 + wn fY,4 + whhY) = wc(fX,1 – fY,1) + we(fX,2 – fY,2) + w3(fX,3 – fY,3) + w4(fX,4 – f Y,4) + wh(hX – hY) = wcmXY,1 + wemXY,2 + wn mXY,3 + wn mXY,4 + wh mXY,h (9) In Equation 9, fi,j is an indicator variable reflecting whether the feature on dimension j in alternative i is present (+1), absent (–1), or unknown (0), and hi indicates whether i is coherent (+1), incoherent (–1), or neither (0). Thus, mXY,j (= fX,j – fY,j) are match variables indicating whether alternatives X and Y match on dimension j. In addition, wc, we, and wn are the evidentiary weights provided by the cause feature, effect feature, and the neutral features, respectively. That is, a single item’s degree of category membership is increased by wi if a feature on dimension i is present and decreased by wi if it is absent. Finally, wh is defined as the weight associated with whether the object exhibits coherence: An object’s degree of category member is increased by wh if the cause and effect features are both present or both absent and decreased by wh if one is present and the other absent. Note that Equation 8 predicts a choice probability in favor of X of close to 1 when diffk (X, Y) >> 0, close to 0 when diffk (X, Y) << 0, and close to .5 when Causal-Based Classification: A Review 53 diffk (X, Y) % 0. The values of wc, we, wn, and wh yielded by the logistic regression analysis averaged over subjects are presented in Table 7. First, note that a causal status effect—the difference between the importance of the cause and effect features—is reflected in the difference between parameters wc and we. In fact, this difference was not significantly different than zero for either children or adults (Table 7), indicating that neither group exhibited a causal status effect. The absence of a causal status effect is reflected in test pair C in which alternative 10xx (which has the cause but not the effect) was not considered a more likely category member than alternative 01xx (which has the effect but not the cause). Second, an isolated features effect—measured by the difference between the average of the cause and effect features (wc and we) and the neutral features (wn)—was significantly greater than zero for both groups. Finally, both groups exhibited a coherence effect, as indicated by values of wh that were significantly greater than zero. In addition, the effect of coherence was larger in adults than in children (wh = .65 vs. .22). These results have several implications. First, besides replicating the presence of a coherence effect in adults (Section 6), this study is the first to document a coherence effect in 5–6 year old children (albeit one smaller in magnitude than in adults). Second, the isolated features effect similarly replicates adult findings reviewed earlier (Section 5) and extends those findings to children. Third, this study replicates the numerous adult studies reported in Section 4 in which a causal status effect failed to obtain and shows that this effect is also not inevitable in children. Of course, the finding of a coherence effect but not a causal status effect in children supports the possibility that previous reports of a causal status effect in children (e.g., Ahn et al. 2000b; Meunier & Cordier, 2009) reflected an effect of coherence instead.18 This preliminary study leaves several questions unanswered. One concerns whether a causal status effect fails to obtain in children for the same reasons it does adults. For example, adult studies described above showed no causal status effect with deterministic links. Thus, because Rehder and Hayes did not specify the strength of the causal relation between the cause and effect feature, the absence of a causal status effect may have been due to adults and children interpreting that link as deterministic (e.g., large lungs always allows roobans to stay underwater a long time). Accordingly, new studies are being Causal-Based Classification: A Review 54 planned that use the same materials and procedure but test probabilistic causal links. This study also makes an important methodological point, namely, how the separate effects of causal status, isolated features, and coherence can be evaluated in a forced-choice paradigm using logistic regression. Just as linear regression does for rating scale data, logistic regression provides the means to separate the multiple effects of causal knowledge on categorization, including the importance of both features and feature combinations. 8.2. Explicit Causal Reasoning in Children’s Categorization There is considerable evidence that the explicit causal reasoning observed in adult categorization is also exhibited by children. For example, in a series of studies Gopnik, Sobel, and colleagues have shown that children can reason causally from observed evidence to an unobserved feature that determines category membership (Gopnik & Sobel, 2000; Gopnik, Glymour, Sobel, Schulz, & Kushnir, 2004; Sobel, Tenenbaum, & Gopnik, 2004; Sobel & Kirkham, 2006; Sobel, Yoachim, Gopnik, Meltoff, & Blumenthal, 2007). In these studies, children are shown a device called a blicket detector and told that it activates (i.e., plays music) whenever blickets are placed on it. They then observe a series of trials in which (usually two) objects are placed on the blicket detector either individually or together after which the machine either does or does not activate. For example, in a backward blocking paradigm tested in Sobel et al. (2004, Experiment 1), two blocks (A and B) were twice placed on the machine causing it to activate followed by a third trial in which A alone caused activation. On a subsequent classification test, 3- and 4year-olds affirmed that B was blicket with probability .50 and .13, respectively, despite that the machine activated on every trial in which B was present; these probabilities were 1.0 in an indirect screening off condition that was identical except that the machine didn’t activate on the final A trial. Apparently, children were able to integrate information from the three trials to infer whether B had the defining property of blickets (the propensity to activate the blicket detector). In particular, in the backward blocking condition they engaged in a form of discounting in which the trial in which A alone activated the machine was sufficient to discount evidence that B was a blicket. Sobel and Kirkham (2006) reached similar conclusions for 24-month olds (and 8-month olds using anticipatory eye movements as a dependent measure). The full pattern of results from these studies has no interpretation under alternative Causal-Based Classification: A Review 55 associative learning theories. Moreover, Kushnir and Gopnik (2005) have shown that children can infer category membership on the basis of another type of causal reasoning, namely, on the basis of interventions (in which the subject rather than the experimenter places blocks on the machine). Notably, Booth (2007) has shown how these sorts of causal inferences in the service of classification result in children learning more about a category’s noncausal features. 9. Summary and Future Directions This chapter has demonstrated three basic types of effect of causal knowledge on classification. First, causal knowledge changes the importance of individual feature. A feature’s importance to category membership can increase to the extent that it is “more causal” (a causal status effect), it has many causes (a multiple cause effect), and is involved in at least one causal relation (an isolated feature effect). Second, causal knowledge affects which combinations of features make for good category members, namely, those that manifest the interfeature correlations expected to be generated by causal laws (a coherence effect). These expected combinations include both pairwise feature correlations and higherorder interactions among features. Finally, causal knowledge supports the inferences from the features of an object one can observe to those that can’t, which in turn influence a category membership decision. Evidence was reviewed indicating how these inferences can occur in both the backward (diagnostic) direction as well as the forward (prospective) direction. Several of these effects have been demonstrated in young children. This chapter has also discussed the implications these results have for current models of causalbased classification. Briefly put, the generative model accounts for vastly more of the results obtained testing experimental categories than the alternative dependency model. As shown, the generative model correctly predicts how the magnitude of the causal status effect varies with (a) causal strength, (b) the strengths of background causes, and (c) the presence of unobserved “essential” features. It also accounts for the multiple cause effect. It accounts for the coherence effect, including the observed higher-order interactions, and how the magnitudes of the two-way interactions vary with experimental conditions (e.g., causal strength) and type (directly related feature pairs versus indirectly related ones). Finally, by assuming that causal category knowledge is represented as a graphical model, it supports the diagnostic Causal-Based Classification: A Review 56 and prospective causal reasoning from observed features to unobserved ones. The dependency model, in contrast, is unable to account for any of these phenomena. Nevertheless, note that there was one failed prediction of the generative model, namely, the presence of the isolated feature effect. Of course, the dependency model is also unable to account for this effect. In the final subsections below, I briefly present other issues and directions for future research. 9.1. Alternative Causal Structures and Uncertain Causal Models The experimental studies reviewed here have involved only one particular sort of causal link, namely, a generative cause between two binary features. However, one’s database of causal knowledge includes many other sorts of relations. Some causal relations are inhibitory in that the presence of one variable decreases the probability of another. Conjunctive causes obtain when multiple variables are each required to produce an effect (as fuel, oxygen, and spark are all needed for fire). Binary features involved in causal relations can be additive (present or absent) or substitutive (e.g., male vs. female) (Tversky & Gati, 1982); in addition, variables can be ordinal or continuous (Waldmann et al., 1995). There are straightforward extensions to the generative model to address these possibilities, but as yet few empirical studies have assessed their effect on classification. Another important possibility is cases in which variables are related in causal cycles; indeed, many cycles were revealed by the theory-drawing tasks used in Sloman et al.’s (1998) and Kim and Ahn’s (2002a; b) studies of natural categories. Kim et al. (2009) has tested the effect of causal cycles in novel categories and proposed a variant of the dependency model that accounts for their result by assuming that causal cycles are unraveled one time cycle (e.g., the causal model {X&Y} is replaced by {Xt!Yt+1, Yt!Xt+1}, where t represents time). They also note how a similar technique can allow the generative model to be applied to causal cycles. Still, Kim et al.’s use of the missing feature method prohibited an assessment of coherence effects or the weight of individual features in a manner that is independent of feature interactions. Thus, there is still much to learn about how causal cycles affect classification. Finally, an important property of causal models is the representation of uncertainty. You may believe that 100% of roobans have sticky feet and that 75% of myastars have high density, but your confidence in these beliefs may be either low or high depending on their source (e.g., they may be based Causal-Based Classification: A Review 57 on a small and biased sample or a large number of observations; they may come from a reliable or unreliable individual, etc.). Your confidence in interfeature causal relations (e.g., the belief that sticky feet are caused by roobans eating fruit) can similarly vary on a continuum. There are known ways to represent causal model parameters as subjective probability density functions and learn and reason with those models (Griffiths & Tenenbaum, 2005; Lu et al. 2008) that have obvious extensions to classification, but again there have been few empirical studies that have examined these issues. One exception is Ahn et al. (2000) who taught subjects causal relations that were implausible (because they contrasted with prior knowledge) and found, not surprisingly, no causal status effect. 9.2. Categories’ Hidden Causal Structure As mentioned, the purpose of testing novel rather than real-world categories is that it affords greater experimental control over the theoretical/causal knowledge that people bring to the classification task. Nevertheless, even for a novel category people may assume additional causal structure on the basis of its ontological kind (e.g., whether it is an artifact or biological kind). Research reviewed above has shown how underlying “essential” features can affect classification even when they’re not observed in test items (Rehder & Kim, 2009a; Hampton et al. 2007). And, the results of Chaigneau et al. (2004) suggests that classifiers can engage in prospective causal reasoning to infer an unobserved feature (an artifact’s potential function) to decide category membership. Ahn and Kim (2001) have referred to systematic differences between domains as “content effects,” and I would expand this notion to include the sort of default causal models that people assume in each domain. Thus, continuing to elucidate the sorts of default hidden causal structures that are associated with the various ontological kinds (and which of those causal structures are treated as decisive for category membership) and investigating how that those structures influence real-world categorization remains a key aim of future research. 9.3. Causal Reasoning and the Boundaries of Causal Models That classifiers can infer hidden causal structure raises the questions about the sorts of variables that can contribute to those inferences. I have reviewed studies showing that people can causally infer unobserved features from observed ones, but nothing prevents inferences involving variables not normally be considered “features” of the category (Oppenheimer et al., 2009). For example, you may be Causal-Based Classification: A Review 58 unable to identify the insects in your basement, until you see your house’s damaged wooden beams and realize you have termites. But is the chewed wood a “feature” of termites? Or, to take a more fanciful example, we all know that bird wings cause flying which causes birds to build nests in trees, but perhaps the nests also cause breaking tree branches, which cause broken windshields, which cause higher car insurance rates, and so on. But although a neighborhood’s high insurance rates might imply a large population of large birds they are not a feature of birds. Causal relations involving bird features also go backwards indefinitely (birds have wings because they have bird DNA, bird DNA produces wings because of a complicated evolutionary past, etc.). These examples raise two questions. The first concerns the boundaries of categories’ causal models. If a causal model includes variables directly or indirectly related to a category’s features then (because everything is indirectly connected to everything else) all variables are included, which means that all categories have the same causal model (the model of everything). Clearly, the causal model approach needs to specify the principles that determine which variables are part of a category’s causal model. The second question concerns how the evidence that a variable provides for category membership differs depending on whether it is part of the causal model or not. My own view is that a full model of causal-based classification will involve two steps. In the first step classifiers use all relevant variables (those both inside and outside of the causal model), to infer the presence of unobserved features via the causal relations that link them (Rehder, 2007; Rehder & Kim, 2009a). In the second step, the classifier evaluates the likelihood that the (observed and inferred) features that are part of the category’s causal model were generated by that model. 9.4. Additional Tests with Natural Categories As mentioned, although a large number of studies have tested novel categories, others have assessed causal-based effects in real-world categories. Using the missing feature method, these studies have generally reported a causal status effect (Ahn, 1998; Kim & Ahn, 2002a; b; Sloman et al. 1998). Moreover, in contrast to the studies reviewed above testing novel categories favoring the generative model, research with natural categories has provided evidence supporting the dependency model, as classification judgments were shown to exhibit substantial correlations with the dependency model’s Causal-Based Classification: A Review 59 predictions derived from subjects’ category theories (measured, e.g., via the theory drawing task). There are a number of uncertainties regarding the interpretation of these studies, some of which I have already mentioned. The numerous confounds associated with natural categories mean that the apparently greater importance of more causal features may be due to other factors (e.g., they may also have higher category validity, i.e., been observed more often in category members). In addition, because these studies used the missing feature method to assess feature weights, the causal status effect reported by these studies could have reflected coherence effects instead if the more causal features were also those involved in a greater number of causal relations. Finally, only the dependency model was fit to the data, and so naturally it is unknown whether the generative model wouldn’t have achieved a better fit. One notable property of the causal relations measured in these studies is that they correspond to those for which the generative model also predicts a strong causal status effect. For example, Kim and Ahn (2002a, Experiment 2) had subjects rate the strength of the causal links on a three-point scale (1–weak, 2– moderate, 3–strong) and found that the vast majority of links had an average strength of between 1 and 1.5. As I have shown, the generative model predicts a strong causal status effect when weak causal links produce each feature in a causal chain with decreasing probability. Sorely needed therefore are new studies of real-world categories that take into account what has been learned from studies testing novel materials. First, subjects must be presented with additional test items, namely, those that are missing more than just one feature. This technique will allow coherence effects to be assessed and will provide a more accurate measure of features weights (see Section 2.2).19 Second, both the dependency and generative models must be fit to the resulting data. Results showing that the causal status effect increases with causal strength will favor the dependency model, ones showing it decreases with causal strength, and the presence of multiple cause and coherence effects, will favor the generative model. Finally, the confound between a category’s causal and empirical/statistical information can be addressed by seeking objective measures of the later, perhaps by using corpus-based techniques (e.g., Landauer & Dumais, 1997). Statistical methods like multiple regression can then be used to determine whether causal theories provide any additional predictive power above and beyond features’ objective category validity. Causal-Based Classification: A Review 60 9.5. Processing Issues The studies reviewed above all used unspeeded judgments in which subjects were given as long as they wanted to make a category membership decision. It is reasonable to ask whether these judgments would be sensitive to causal knowledge if they were made the same way that categorization decisions are made hundreds of times each day, namely, in a few seconds or less. One might speculate that use of causal knowledge is an example of slower, deliberate, “analytical” reasoning that is unlikely to appear under speeded conditions (Sloman, 1996; Smith & Sloman, 1994). But studies have generally found instead that effects of theoretical knowledge on classification obtain even under speeded conditions (Lin & Murphy, 1997; Palmeri & Blalock, 2000). For example, Luhmann et al. (2006) found that classification judgments exhibited a causal status effect even when they were made in 500 ms. Besides the causal status effect, no studies have examined how the other sorts of causal-based effects respond to manipulations of response deadline. Luhmann et al. proposed that in their study subjects “prestored” feature weights during the initial learning of the category’s features and causal relations and then were able to quickly access those weights during the classification test. If this is correct, then one might expect to also see a multiple-cause effect and isolated feature effect at short deadlines. In contrast, because the coherence effect and explicit causal reasoning involve processing the values on multiple stimulus dimensions, these effects may only emerge at longer response deadlines. 9.6. Integrating Causal and Empirical/Statistical Information Another outstanding question concerns how people’s beliefs about a category’s causal structure is integrated with the information gathered through first hand observation of category members. On one hand, numerous studies have demonstrated how general semantic knowledge that relates category features alters how categories are learned (e.g., Murphy & Allopenna, 1994; Rehder & Ross, 2001; Wattenmaker et al., 1986). However, there are relatively few studies examining how a category’s empirical/statistical information is integrated with specifically causal knowledge. One exception is Rehder and Hastie (2001) who presented subjects with both causal laws and examples of category members and by so doing orthogonally manipulated categories’ causal and empirical structure (providing either no data or data with or without correlations that were consistent with the causal links). Although subjects’ subsequent Causal-Based Classification: A Review 61 classification judgments reflected the features’ objective category validity, they were generally insensitive to the correlations that inhered in the observed data. On the other hand, Waldmann et al. (1995) found that the presence of interfeature correlations affected subjects’ interpretation of the causal links they were taught. More research is needed to determine how and to what extent the correlational structure of observed data is integrated into a category’s causal model. Note that because the representation of causal relations assumed by the generative model emerged out of the causal learning literature (Cheng, 1997), it generates clear hypotheses regarding how the strengths of causal links (the m and b parameters) should be updated in light of observed data. 9.7. Developmental Questions Given the small number of relevant studies (Ahn et al. 2000b; Meunier & Cordier, 2009; Hayes & Rehder, 2009), it is not surprising that there are many outstanding questions regarding the effect of causal knowledge on classification feature weights and interactions. Because I attributed the absence of a causal status effect in Hayes and Rehder (2009) to 5-year olds interpreting the causal link as deterministic, an obvious possibility to test is whether this effect would be observed in children when links are probabilistic instead. Another question is whether the size of the coherence effect in children varies with causal link strength in the way predicted by the generative model. Finally, it is currently unknown whether children exhibit a multiple cause effect. 9.8. Additional Dependent Variables Finally, whereas all studies reviewed here asked for some sort of category membership judgment, it is important to expand the types of dependent variables that are used to assess causal-based effects. For example, several studies have examined the effect of theoretical knowledge on category construction (the way in which people spontaneously sort items together; Ahn & Medin, 1992; Kaplan & Murphy, 1999; Medin, Wattenmaker, & Hampson, 1987) but only a few have examined the effect of causal knowledge. One exception is Ahn and Kim (2000, Experiment 4) who presented subjects with match-to-sample trials consisting of a target with one feature that caused another (X!Y) and two cases that shared with the target either the cause (X!Z) or the effect (W!Y). Subjects spontaneously sorted together items on the basis of shared causes rather than shared effects, that is, they exhibited a causal status effect. On the other Causal-Based Classification: A Review 62 hand, Ahn (1999) found that sorting failed to be dominated by any feature (including the cause) for items with four features arranged in a causal chain. But subjects did sort on the basis of the cause for common cause structures and on the basis of the effect for common effect structures (mirroring the results with explicit classification judgments found by Rehder & Hastie, 2001, Figure 9). Additional research is needed to determine whether the other effects documented here (e.g., sensitivity of the causal status effect to causal strength, coherence effects, etc.) also obtain in category construction tasks. 9.9. Closing Words Twenty-five years have passed since Murphy and Medin (1985) observed how concepts of categories are embedded in the rich knowledge structures that make up our conceptual systems. What has changed in the last 10-15 years is that insights regarding how such knowledge affects learning, induction, and classification have now been cashed out as explicit computational models. This chapter has presented models of how interfeature causal relations affect classification and reviewed the key empirical phenomena for and against those models. That so many important outstanding questions remain means that this field can be expected to progress as rapidly in the next decade as it has in the past one. Causal-Based Classification: A Review 63 References Ahn, W. (1998). Why are different features central for natural kinds and artifacts? The role of causal status in determining feature centrality. Cognition, 69, 135-178. Ahn, W. (1999). Effect of causal structure on category construction. Memory & Cognition, 27, 1008-1023. Ahn, W., & Medin, D. L. (1992). A two-stage model of category construction. Cognitive Science, 16, 81-121. Ahn, W., Kim, N. S., Lassaline, M. E., & Dennis, M. J. (2000a). Causal status as a determinant of feature centrality. Cognitive Psychology, 41, 361-416. Ahn, W., Gelman, S. A., Amsterdam, A., Hohenstein, J., & Kalish, C. W. (2000b). Causal status effect in children's categorization. Cognition, 76, B35-B43. Ahn, W., & Kim, N. S. (2001). The causal status effect in categorization: An overview. In D. L. Medin (Ed.), The psychology of learning and motivation (Vol. 40, pp. 23-65). San Diego, CA: Academic Press. Ahn, W., Marsh, J. K., Luhmann, C. C., & Lee, K. (2002). Effect of theory based correlations on typicality judgments. Memory & Cognition, 30, 107-118. Ahn, W., Levin, S., & Marsh, J. K. (2005). Determinants of feature centrality in clinicians' concepts of mental disorders, Proceedings of the 25th Annual Conference of the Cognitive Science Society. Mahwah, New Jersey: Lawrence Erlbaum Associates. Ahn, W., Flanagan, E., Marsh, J. K., & Sanislow, C. (2006). Beliefs about essences and the reality of mental disorders. Psychological Science, 17, 759-766. Ashby, F. G., & Maddox, W. T. (2005). Human category learning. Annual Review of Psychology, 56, 149-178. Bloom, P. (1998). Theories of artifact categorization. Cognition, 66, 87-93. Bonacich, P., & Lloyd, P. (2001). Eigenvector-like mevasures of centrality for asymmetric relations. Social Networks, 23, 191-201. Booth, A. (2007). The cause of infant categorization. Cognition, 106, 984-993. Causal-Based Classification: A Review 64 Braisby, N., Franks, B., & Hampton, J. (1996). Essentialism, word use, and concepts. Cognition, 59, 247-274. Buehner, M. J., Cheng, P. W., & Clifford, D. (2003). From covariation to causation: A test of the assumption of causal power. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 1119-1140. Chaigneau, S. E., Barsalou, L. W., & Sloman, S. A. (2004). Assessing the causal structure of function. Journal of Experimental Psychology. General, 133, 601-625. Cheng, P. (1997). From covariation to causation: A causal power theory. Psychological Review, 104, 367-405. Cheng, P. W., & Novick, L. R. (2005). Constraints and nonconstraints in causal learning: Reply to White (2005) and to Luhmann and Ahn (2005). Psychology Review, 112, 694- 707. Fischoff, B., Slovic, P., & Lichtenstein, S. (1978). Fault trees: Sensitivity of estimated failure probabilities to problem representation. Journal of Experimental Psychology: Human Perception and Performance, 4, 330-344. Gelman, S. A. (2003). The essential child: The origins of essentialism in everyday thought. New York: Oxford University Press. Gelman, S. A., & Wellman, H. M. (1991). Insides and essences: Early understandings of the nonobvious. Cognition, 38, 213-244. Gentner, D. (1989). The mechanisms of analogical learning. In S. Vosniadou & A. Ortony (Eds.), Similarity and analogical reasoning (pp. 199-241). New York: Cambridge University Press. Gopnik, A., Glymour, C., Sobel, D. M., Schulz, L. E., & Kushnir, T. (2004). A theory of causal learning in children: Causal maps and Bayes nets. Psychological Review, 111, 3-23. Gopnik, A., & Sobel, D. M. (2000). Detecting blickets: How young children use information about novel causal powers in categorization and induction. Child Development, 71, 1205-1222. Griffiths, T. L., & Tenenbaum, J. B. (2005). Structure and strength in causal induction. Cognitive Psychology, 51, 334-384. Hampton, J. A. (1995). Testing the prototype theory of concepts. Journal of Memory and Causal-Based Classification: A Review 65 Language, 34, 686-708. Hampton, J. A., Estes, Z., & Simmons, S. (2007). Metamorphosis: Essence, appearance, and behavior in the categorization of natural kinds. Memory & Cognition, 35, 1785-1800. Harris, H.D., & Rehder, B. (2006). Modeling category learning with exemplars and prior knowledge. In R. Sun & N. Miyake (Eds.), Proceedings of the 28th Annual Conference of the Cognitive Science Society (pp. 1440-1445). Mahwah, NJ: Erlbaum. Hayes, B.K., & Rehder, B. (2009). Children’s causal categorization. In preparation. Johnson, S. C., & Solomon, G. E. A. (1997). Why dogs have puppied and cates have kittens: The role of birth in young children's understanding of biological origins. Child Development, 68, 404-419. Judd, C. M., McClelland, G. H., & Culhane, S. E. (1995). Data analysis: Continuing issues in the everyday analysis of psychological data. Annual Review of Psychology, 46, 433-465. Kalish, C. W. (1995). Essentialism and graded category membership in animal and artifact categories. Memory & Cognition, 23, 335-349. Kaplan, A. S., & Murphy, G. L. (1999). The acquisition of category structure in unsupervised learning. Memory & Cognition, 27, 699-712. Keil, F. C. (1989). Concepts, kinds, and cognitive development. Cambridge, MA: MIT Press. Keil, F. C. (1995). The growth of causal understandings of natural kinds. In D. Sperber, D. Premack & A. J. Premack (Eds.), Causal cognition: A multidisciplinary approach (pp. 234-262). Oxford: Clarendon Press. Kim, N. S., & Ahn, W. (2002a). Clinical psychologists' theory-based representation of mental disorders affect their diagnostic reasoning and memory. Journal of Experimental Psychology: General, 131, 451-476. Kim, N. S., & Ahn, W. (2002b). The influence of naive causal theories on lay concepts of mental illness. American Journal of Psychology, 115, 33-65. Kim, N. S., Luhmann, C. C., Pierce, M. L., & Ryan, M. M. (2008). Causal cycles in categorization. Memory & Cognition, 37, 744-758. Causal-Based Classification: A Review 66 Kushnir, T., & Gopnik, A. (2005). Young children infer causal strength from probabilities and interventions. Psychological Science, 16, 678-683. Lamberts, K. (1995). Categorization under time pressure. Journal of Experimental Psychology: General, 124, 161-180. Lamberts, K. (1998). The time course of categorization. Journal of Experimental Psychology: Learning, Memory, and Cognition, 24, 695-711. Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato's problem: The latent semantic analysis theory of knowledge acquisition, induction, and representation. Psychological Review, 104, 211240. Lin, E. L., & Murphy, G. L. (1997). The effects of background knowledge on object categorization and part detection. Journal of Experimental Psychology: Human Perception and Performance, 23, 1153-1163. Lober, K., & Shanks, D. R. (2000). Is causal induction based on causal power? Critique of Cheng (1997). Psychological Review, 107, 195-212. Lombrozo, T. (2007). Simplicity and probability in causal explanation. Cognition, 55, 232-257. Lombrozo, T. (2009). Explanation and categorization: How "why?" informs "what?". Cognition, 110, 248-253. Lu, H., Yuille, A. L., Liljeholm, M., Cheng, P. W., & Holyoak, K. J. (2008). Bayesian generic priors for causal learning. Psychological Review, 115, 955-984. Luhmann, C. C., Ahn, W., & Palmeri, T. J. (2006). Theory-based categorization under speeded conditions. Memory & Cognition, 34(5), 1102-1111. Malt, B. C., & Johnson, E. C. (1992). Do artifacts have cores? Journal of Memory and Language, 31, 195-217. Malt, B. C. (1994). Water is not H2 O. Cognitive Psychology, 27, 41-70. Marsh, J., & Ahn, W. (2006). The role of causal status versus inter-feature links in feature weighting. In R. Sun & N. Miyake (Eds.), Proceedings of the 28th Annual Conference of the Cognitive Science Society (pp. 561-566). Mahwah, NJ: Erlbaum. Causal-Based Classification: A Review 67 Matan, A., & Carey, S. (2001). Developmental changes within the core of artifact concepts. Cognition, 78, 1-26. Medin, D. L., Wattenmaker, W. D., & Hampson, S. E. (1987). Family resemblance, conceptual cohesiveness, and category construction. Cognitive Psychology, 19, 242-279. Medin, D. L., & Ortony, A. (1989). Psychological essentialism. In S. Vosniadou & A. Ortony (Eds.), Similarity and analogical reasoning (pp. 179-196). Cambridge, MA: Cambridge University Press. Medin, D. L., & Schaffer, M. M. (1978). Context theory of classification learning. Psychological Review, 85, 207-238. Meunier, B., & Cordier, F. (2009). The biological categorizations made by 4 and 5-year olds: The role of feature type versus their causal status. Cognitive Development, 24, 34-48. Minda, J. P., & Smith, J. D. (2002). Comparing prototype-based and exemplar-based accounts of category learning and attetional allocation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28, 275-292. Morris, M. W., & Larrick, R. P. (1995). When one cause casts doubt on another: A normative analysis of discounting in causal attribution. Psychological Review, 102, 331-355. Murphy, G. L., & Allopenna, P. D. (1994). The locus of knowledge effects in concept learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 904-919. Murphy, G. L., & Medin, D. L. (1985). The role of theories in conceptual coherence. Psychological Review, 92, 289-316. Murphy, G. L., & Wisniewski, E. J. (1989). Feature correlations in conceptual representations. In G. Tiberchien (Ed.), Advances in cognitive science: Vol. 2. Theory and applications (pp. 23-45). Chichester, England: Ellis Horwood. Murphy, G. L. (2002). The big book of concepts: MIT Press. Oppenheimer, D. M., Tenenbaum, J. B., & Krynski, T. (2009). Categorization as causal explanation: Discounting and augmenting of concept-irrelevant features in categorization. Submitted for publication. Causal-Based Classification: A Review 68 Palmeri, T. J., & Blalock, C. (2000). The role of background knowledge in speeded perceptual categorization. Cognition, 77, B45-B47. Patalano, A. L., & Ross, B. H. (2007). The role of category coherence in experience-based prediction. Psychonomic Bulletin & Review, 14, 629-634. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Mateo, CA: Morgan Kaufman. Poulton, E. C. (1989). Bias in quantifying judgments. Hillsdale, NJ: Erlbaum. Rehder, B. (2003a). Categorization as causal reasoning. Cognitive Science, 27, 709-748. Rehder, B. (2003b). A causal-model theory of conceptual representation and categorization. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 1141-1159. Rehder, B. (2007). Essentialism as a generative theory of classification. In A. Gopnik, & L. Schultz, (Eds.), Causal learning: Psychology, philosophy, and computation (pp. 190-207). Oxford, UK: Oxford University Press. Rehder, B. (2009a). Causal-based property generalization. Cognitive Science, 33, 301-343. Rehder, B. (2009b). The when and why of the causal status effect. Submitted for publication. Rehder, B. & Hastie, R. (2001). Causal knowledge and categories: The effects of causal beliefs on categorization, induction, and similarity. Journal of Experimental Psychology: General, 130, 323-360. Rehder, B., & Hastie, R. (2004). Category coherence and category-based property induction. Cognition, 91, 113-153. Rehder, B., & Murphy, G. L. (2003). A Knowledge-Resonance (KRES) model of category learning. Psychonomic Bulletin & Review, 10, 759-784. Rehder, B., & Burnett, R. C. (2005). Feature inference and the causal structure of object categories. Cognitive Psychology, 50, 264-314. Rehder, B. & Kim, S. (2006). How causal knowledge affects classification: A generative theory of categorization. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32, 659-683. Rehder, B. & Kim, S. (2009a). Classification as diagnostic reasoning. Memory & Cognition, 37, 715-729. Causal-Based Classification: A Review 69 Rehder, B. & Kim, S. (2009b). Causal status and coherence in causal-based categorization. Journal of Experimental Psychology: Learning, Memory, and Cognition. Accepted pending minor revisions. Rehder, B. & Ross, B.H. (2001). Abstract coherent concepts. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27, 1261-1275. Rehder, B., & Milovanovic, G. (2007). Bias toward sufficiency and completeness in causal explanations. In D. MacNamara & G. Trafton (Eds.), Proceedings of the 29th Annual Conference of the Cognitive Science Society (p. 1843). Reichenbach, H. (1956). The direction of time. Berkeley: University of California Press. Rips, L. J. (1989). Similarity, typicality, and categorization. In S. Vosniadou & A. Ortony (Eds.), Similarity and analogical reasoning (pp. 21-59). New York: Cambridge University Press. Rips, L. J. (2001). Necessity and natural categories. Psychological Bulletin, 127, 827-852. Rosch, E. H., & Mervis, C. B. (1975). Family resemblance: Studies in the internal structure of categories. Cognitive Psychology, 7, 573-605. Salmon, W. C. (1984). Scientific explanation and the causal structure of the world. Princeton, NJ: Princeton University Press. Sloman, S. A. (2005). Causal models: How people think about the world and its alternatives. Oxford, UK: Oxford University Press. Sloman, S. A. (1996). The empirical case for two systems of reasoning. Psychological Bulletin, 119, 3-23. Sloman, S. A., & Lagnado, D. A. (2005). Do we "do"? Cognitive Science, 29, 5-39. Sloman, S. A., Love, B. C., & Ahn, W. (1998). Feature centrality and conceptual coherence. Cognitive Science, 22, 189-228. Smith, E. E., & Sloman, S. A. (1994). Similarity- versus rule-based categorization. Memory & Cognition, 22, 377-386. Sobel, D. M., & Kirkham, N. Z. (2006). Blickets and babies: The development of causal reasoning in toddlers and infants. Developmental Psychology, 42, 1103-1115. Causal-Based Classification: A Review 70 Sobel, D. M., Tenenbaum, J. B., & Gopnik, A. (2004). Children's causal inferences from indirect evidence: Backwards blocking and Bayesian reasoning in preschoolers. Cognitive science, 28, 303 -333. Sobel, D. M., Yoachim, C. M., Gopnik, A., Meltzoff, A. N., & Blumenthal, E. J. (2007). The blicket within Preschoolers' inferences about insides and causes. Journal of Cognition and Development, 8, 159-182. Tversky, A., & Gati, I. (1978). Studies in similarity. In E. Rosch & B. B. Lloyd (Eds.), Cognition and categorization (pp. 79-98). Hillsdale, NJ: Erlbaum. Tversky, A., & Kahneman, D. (1980). Causal schemas in judgments under uncertainty. In M. Fishbein (Ed.), Progress in social psychology. Hillsdale, NJ: Erlbaum. Tversky, A., & Gati, I. (1982). Similarity, separability, and the triangular inequality. Psychological Review, 89, 123-154. Tversky, A., & Koehler, D. J. (1994). Support theory: A nonextensional representation of subjective probability. Psychological Review, 101, 547-567. Waldmann, M. R., & Holyoak, K. J. (1992). Predictive and diagnostic learning within causal models: Asymmetries in cue competition. Journal of Experimental Psychology: General, 121, 222-236. Waldmann, M. R., Holyoak, K. J., & Fratianne, A. (1995). Causal models and the acquisition of category structure. Journal of Experimental Psychology: General, 124, 181-206. Waldmann, M. R., & Hagmayer, Y. (2005). Seeing versus doing: Two modes of accessing causal knowledge. Journal of Experimental Psychology: Learning, Memory, & Cognition, 31, 216-227. Wattenmaker, W. D., Dewey, G. I., Murphy, T. D., & Medin, D. L. (1986). Linear separability and concept learning: Context, relational properties, and concept naturalness. Cognitive Psychology, 18, 158-194. Wisniewski, E. J. (1995). Prior knowledge and functionally relevant features in concept learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 449-468. Causal-Based Classification: A Review 71 Author Note Bob Rehder, Department of Psychology, New York University. Correspondence concerning this chapter should be addressed to Bob Rehder, Department of Psychology, 6 Washington Place, New York, NY 10003 (email: bob.rehder@nyu.edu). Causal-Based Classification: A Review 72 Footnotes 1 Moreover, there is evidence suggesting that causal knowledge and within-category empirical- statistical information are conflated in people’s mental representation of natural categories. For example, Sloman et al. (1998) conducted a factor analysis showing that category features vary along three dimensions. The first two were identified as perceptual salience (assessed with questions like "How prominent in your conception of apples is that it grows on trees?") and diagnosticity (or cue validity, assessed with questions like "Of all things that grow on trees, what percentage are apples?"). Measures loading on a third factor included both category validity (i.e., “What percentage of apples grow on trees?”) and those related to a construct they labeled conceptual centrality or mutability (assessed with questions like “How good an example of an apple would you consider an apple that does not ever grow on trees?”). Category validity and centrality were also highly correlated in a study testing a novel category that was designed to dissociate the two measures (Sloman et al, Study 5). Conceptual centrality corresponds to one of the questions addressed this chapter, namely, the evidence that an individual feature provides for a particular category. Thus, it may be difficult to separate the effects of causal knowledge and observed category members on classification into natural categories. 2 Indeed, when the regression predictor for a feature is coded as +1 when the feature is present and –1 when it is absent and all other predictors are orthogonal (which occurs when all possible test items are presented), the resulting regression weight is exactly half this difference. A concrete example of a regression equation follows. 3 For example, it is reasonable to ask whether the assumption of linearity that is part of linear regression is appropriate for a classification rating task given research suggesting that the evidence that features provide for category membership combines multiplicatively rather than additively (Minda and Smith, 2002). Thus, either a logarithmic transformation of the classification ratings or an alternative method of analysis that assumes a multiplicative rule might be appropriate. For example, Rehder (2003b) analyzed classification rating data (of all possible test items that could be formed on four binary dimensions) by normalizing the ratings so they summed to 1 and then treating the results as representing a Causal-Based Classification: A Review 73 probability distribution. From this distribution, I derived the “probability” of each feature and the “probabilistic contrast” for each pair of features, measures that are analogous to the feature weights and two-way interactions derived from linear regression. Because it is based on probabilities, this method implicitly incorporates the assumption that evidence combines multiplicatively and thus may yield what a more accurate measure of “feature weights.” On the other hand, it requires the strong assumption that ratings map one-to-one onto probability estimates, an assumption that may have its own problems. In practice, this probabilistic method of analysis and linear regression have always yielded the same qualitative conclusions. 4 For example, one of the variants computes what is known as alpha centralities (Bonacich & Lloyd, 2001). When the ds = 2, alpha centralities for the chain network are 3, 2, and 1 for features X, Y, and Z, respectively, whereas they are 4.75, 2.50, and 1 when the ds = 3. 5 Because this study used the missing feature method, there is uncertainty regarding whether feature Y was more important than Z, because the lower rating of missing-Y item could reflect an effect of coherence instead (i.e., it violates two causal relations whereas the missing-Z item violates one; see Table 1’s Example 3, discussed in Section 2.2). Nevertheless, concluding that feature X is more important than Z on the basis of the difference in ratings between the missing-X and missing-Z items is sound, because those items are equated on the number of violated correlations (one each). 6 Indeed, as part of another experiment testing Rehder and Kim's materials we asked participants to judge how often a cause produced its effect. The average response was 91% (the modal response was 100%), supporting the conjecture that many subjects interpreted the causal links as nearly deterministic. 7 There may be some uncertainty regarding the dependency model’s predictions for this experiment that stems from ambiguities regarding how its construct of “causal strength” should be interpreted. We interpret it to be a measure of the propensity of the cause to produce the effect, that is, as a causal power (corresponding to the generative model’s m parameter). An alternative interpretation is that it corresponds to a measure of covariation between the cause and effect (e.g., the familiar !P rule of causal induction). Under this alternative interpretation, the dependency model would also predict a Causal-Based Classification: A Review 74 weaker causal status effect in the Background-50 condition (because the causal links themselves are weaker). Although Sloman et al. did not specify which interpretation was intended, we take the work of Cheng and colleagues as showing that when you ask people to judge “causal strength” they generally respond with an estimate of causal power rather than !P (Cheng, 1997; Buehner, Cheng, & Clifford, 2003) and so that is the assumption we make here. Of course, exactly what measure people induce in causal learning tasks is itself controversial (e.g., Lober & Shanks, 2000) and even Buehner et al. found that substantial minority of subjects responded with causal strength estimates that mirrored !P. But even if one grants this alternative interpretation, it means that the dependency model predicts a weaker causal status effect in the Background-50 condition whereas the generative model predicts it should be absent entirely. In addition of course, the generative model but not the dependency model predicts effects of this experiment’s manipulation on feature frequency ratings and coherence effects (as described below). 8 Although explicitly defining essential features in this manner controls the knowledge brought to bear during classification, note that these experimentally-defined “essences” may differ in various ways from (people's beliefs about) some real category essences. Although adults' beliefs about essences are sometimes concrete (e.g., DNA in the case of biological kinds for adults), preschool children's knowledge about animals' essential properties is less specific, involving only a commitment to biological mechanisms that operate on their "insides" (Gelman & Wellman, 1991; Gelman, 2003; Johnson & Solomon, 1997). And, an essential property not just one that just happens to be present in all category members (and absent in all nonmembers), it is one that is present in all category members that could exist. But while the concreteness and noncontingency of people's essentialist beliefs is undoubtedly important under some circumstances, we suggest that a feature that is present in all category members is sufficient to induce a causal status effect under the conditions tested in this experiment. 9 The absence of a significant causal status effect in the Unconnected-Chain-80 condition was somewhat of a surprise given the results from Rehder and Kim’s (2009b) Experiment 1 reviewed in Section 4.1. The Unconnected-Chain-80 condition was identical to that experiment’s Chain-75 condition except for (a) causal strengths were 80% instead of 75% and (b) the presence of an explicit essence, albeit Causal-Based Classification: A Review 75 one that is not causally related to the other features. It is conceivable that the 5% increase in causal strengths may be responsible for reducing the causal status effect; indeed the generative model predicts a slightly smaller causal status effect for m = .80 versus .75. In addition, the presence of an explicit essential feature to which the causal chain was not connected may have led participants to assume that the chain was unlikely to be related to any other essential property of the category (and of course the generative model claims that essential properties to which the causal chain is causally connected promotes a causal status effect). 10 Some studies have claimed to show just such a dissociation between feature importance and category validity, however. For example, in Ahn et al. (2000, Experiment 2) participants first observed exemplars with three features that appeared with equal frequency and then rated the frequency of each feature. They then learned causal relations forming a causal chain and rated the goodness of missing-X, missing-Y, and missing-Z test items. Whereas features' likelihood ratings did not differ, the missing-X item was rated lower than the missing-Y item which was lower than the missing-Z item, a result the authors interpreted as demonstrating a dissociation between category validity and categorization importance. This conclusion is unwarranted, however, because the frequency ratings were gathered before the presentation of the causal relations. Clearly, one can only assess whether perceived category validity mediates the relationship between causal knowledge and features' categorization importance by assessing category validity after the causal knowledge has been taught. Sloman et al. (1998, Study 5) and Rehder and Kim (2009b) gathered likelihood ratings after the causal relationships were learned and found no dissociation with feature weights. 11 For the following 15 studies, “[a/b/c]” represents the number of conditions in which a full (a), partial (b), or zero or negative (c) causal status effect obtained: Sloman et al. (1998) [2/0/0], Ahn (1998) [2/0/0]; Ahn et al (2000) [2/0/0], Rehder and Hastie (2001) [3/0/3], Kim and Ahn (2002b) [1/0/0], Rehder (2003a) [1/0/1], Rehder (2003b) [0/2/1], Rehder and Kim (2006) [0/2/4], Luhmann et al. (2006) [7/2/0], Marsh and Ahn (2006) [2/0/4], Rehder and Kim (2008) [0/1/1], Rehder and Kim (2009b) [5/0/3], Lombrozo (2009) [1/0/1], Rehder (2009) [1/0/3], and Hayes and Rehder (2009) [0/0/2]. Note that because Causal-Based Classification: A Review 76 Sloman et al. and Luhmann et al. either did not gather or report ratings for missing-Y test items in some conditions, the causal status effect is counted as “full” in those cases. Also note that the results from Luhmann et al.’s 300ms deadline condition of their Experiment 2B are excluded 12 Although Ahn and Kim (2001) did not themselves report the results of regression analyses, a regression analysis of the average classification results in their Table I yields weights of 0.47 on the causes and 1.12 on the common effect, confirming the greater importance of the common effect in their study. 13 Note that there have been some studies that have failed to find a multiple cause effect with a common effect network. For example, using virtually the same materials as Rehder and Hastie (2001) and Rehder and Kim (2006) except for the use of the “normal” wording for atypical feature values (see Section 6.4), Marsh and Ahn (2006) failed to find an elevated weight on the common effect. Additional research will be required to determine whether this is a robust finding or one that depends on idiosyncratic details of Marsh and Ahn’s procedure. 14 It is important to note that these predictions depend on the particular model parameters chosen. First, just as for a chain network, the generative model only predicts a causal status effect for a common cause network when causal links are probabilistic. Second, regarding the common effect network, the claim is that the probability of an effect will increase with its number of causes. Whether it will be more probable than the causes themselves (as it is in the example in Table 5) also depends on the strength of the causal links. Third, whether the probability of a common effect in fact increases will interact with the subject’s existing statistical beliefs about the category. For example, if one is quite certain about the category validity of the effect (e.g., because it is based on many observations), then the introduction of additional causes might be accommodated by lowering one’s estimates of the strengths of the causal links (the m and b parameters) instead. See Rehder and Milovanovic (2007) for evidence of this kind of mutual influence between data and theory. Studies that systematical manipulated the strength of causal links in common cause and common effect networks (like Rehder & Kim, 2009b, did with a chain network) have yet to be conducted. Causal-Based Classification: A Review 15 77 They also provided indirect evidence for this claim. They presented subjects with items in which the presence of the common effect feature was unknown and asked them to rate the likelihood that it was present. They found that inference ratings increased as a function of the number of causes present in the item. This result is consistent with the view that people can use causal knowledge to infer category features (Rehder & Burnett, 205). It also implies that a feature will have greater category validity when it has multiple causes. Note that this result is analogous to the findings in Section III in which a change in features weights (a causal status effect) was always accompanied by a change in features’ category validity (likelihood of appearing in category members). 16 Although the interaction between condition and interaction term did not reach significance (p > .15), a separate analysis of the Chain-75 conditions revealed a significant effect of direct versus indirect interaction terms in the Chain-75 condition, p < .01 but not the Chain-100 condition, p = .17. 17 Nevertheless, Chaigneau et al. found that classification ratings for vignettes with inappropriate historical intentions were lower relative to a baseline condition in which all four causes were present. The authors argue that this is a case of causal updating in which information about intentions influenced how subjects represented the artifact’s physical structure (even when information about physical structure was provided as part of the vignette). For example, if the designer intended to create a mop, the subject might be more sure that the object had a physical structure appropriate to mopping. 18 It should be acknowledged that whereas Hayes and Rehder taught their subjects a single link between two features, Ahn et al. and Meunier and Cordier taught theirs a common cause structure in which one feature caused two others, and perhaps a cause with two effects is sufficient to induce a causal status effect in children. Of course, arguing against this possibility are findings above indicating that, at least for adults, a feature’s importance does not generally increase with its number of dependents (see Section 5.1). 19 One challenge to applying the regression method to natural categories concerns the number of features involved. Subjects usually know dozens of features of natural categories, implying the need for 2n test items (assuming n binary features) to run a complete regression that assesses main effects (i.e., Causal-Based Classification: A Review 78 feature weights), two-way interactions, and all higher-order interactions. One compromise is to present only those test items missing either one or two features, allowing an assessment of feature weights and the two-way interactions. Furthermore, the missing-two-feature items could be restricted to those missing two features that are identified (e.g., on a theory drawing task) to be causally related. 79 Table 1 Hypothetical classification ratings for four example categories with features X, Y, and Z. Examples 1 and 2 assume features are related in a common cause structure; Examples 3 and 4 assumes they form a causal chain. Examples 1 and 3 assume all features are have equal classification weights; Examples 2 and 4 assume X > Y = Z. 1 = feature present; 0 = feature absent; x = feature state unknown. Hypothe tical Classificati on Ratings Commo n Cause Ch ain Y'X !Z X!Y !Z Example 1 Example 2 Example 3 Example 4 Weight (X) 1 1 1 1 Weight (Y) 1 .5 1 .5 Weight (Z) 1 .5 1 .5 Weight on interactions 1 1 1 1 111 10 9 10 9 011 (missing only X) 4 3 6 5 101 (missing only Y) 6 6 4 4 110 (missing only Z) 6 6 6 6 100 (missing all but X) 2 3 4 5 010 (missing all but Y) 4 4 2 2 001 (missing all but Z) 4 4 4 4 000 4 5 4 5 1xx 6 6 6 6 x1x 6 5.5 6 5.5 xx1 6 5.5 6 5.5 Parameters Test items 80 Table 2 Feature “centralities” predicted by the dependency model after two iterations for features X, Y, and Z in the chain network of Figure 1 for different values of the causal strength parameters dXY and dYZ. 81 Table 3 Predictions for the generative model for the chain network in Figure 1 for different parameter values. cX = the probability that feature X appears in category members; mij = strength of the causal relation between i and j; bj = strength of the j’s background causes. Direct = contrasts between features that are directly causally related (X and Y, and Y and Z); indirect = contrasts between features that are indirectly related (X and Z). 82 Table 4 Predictions for the generative model for the causal networks in Figure 4 for different parameter values. The likelihood equations for the essentialized model (Figure 4A) assume cE = 1; equations for the unconnected model (Figure 4B) were presented earlier in Table 3. ci = the probability that feature i appears in category members; mij = strength of the causal relation between i and j; bj = strength of the j’s background causes. 83 Table 5 Predictions for the generative model for three-feature common cause and common effect networks. ci = the probability that root feature i appears in category members; mij = strength of the causal relation between i and j; bj = strength of the j’s background causes. Direct = contrasts between features that are directly causally related; indirect = contrasts between features that are indirectly related. 84 Table 6 Test pairs presented by Hayes and Rehder (2009). For each test pair Ti subjects choose whether item X or Y is a better category member. Choice probabilities presented in the final two columns are tested against .50 († p < .10. * p < .05. ** p < .01.). 1 = feature present; 0 = feature absent; x = feature state unknown. Dimension 1 = cause feature; dimension 2 = effect feature; dimensions 3 and 4 = neutral features. Empirical Responses (Prefere nce for X) Test Pair TA Choice Choice X Y 11xx 00xx Adults .99** 5-6 Year Olds .79** TB xx11 xx00 1.0** .73** TC 10xx 01xx .52 .51 TD 10xx xx10 .30** .48 TE 01xx xx01 .32** .43 TF 11xx xx11 .70** .67** TG 00xx xx00 .62* .55 85 Table 7 Average parameter estimates for adults and children in Hayes and Rehder (2009). wc = weight given to cause feature; we = weight given to effect feature; wn = weight given to neutral features; wh = weight given to agreement between cause and effect feature (coherence). Standard errors are presented in parentheses. Causal status, isolated feature, and coherence effects are tested against 0. († p < .10. * p < .05. ** p < .01.) Adults 5-6 Year Olds wc .54 (.07) 0.32 (.05) we .58 (.06) 0.28 (.06) wn .47 (.04) 0.19 (.04) wh .65 (.09) 0.22 (.08) Causal status [wc – we] –.03 (.08) .04 (.07) Isolated [average(wc, we) – wn] .09** (.07) .11* (.05) Coherence [wh] .65** (.09) .22** (.08) Parameter Effect 86 Figure Captions Figure 1. A three-element causal chain. Figure 2. Classification test results of three experiments from Rehder and Kim (2009b). (A) Experiment 1. (B) Experiment 2. (C) Experiment 3.plin is the significance of the linear trend in each condition. Figure 3. Subjects’ feature likelihood estimates (i.e., out of 100, how many category members have that feature) from Rehder and Kim (2009b). (A) Experiment 1. (B) Experiment 2. (C) Experiment 3. plin is the significance of the linear trend in each condition Figure 4. Causal structures tested in Rehder and Kim (2009b), Experiment 3. (A) Essentialized-Chain-80 condition. (B) Unconnected-Chain-80 condition. Figure 5. Causal networks tested in Rehder and Kim (2006), Experiment 3. (A) 1-1-3 condition. (B) 1-1-1 condition. Figure 6. Classification results from Rehder and Kim (2006), Experiment 3. Weight on the “effect” Zs is averaged all three Z features in the 1-1-3 condition and is the single Z feature that plays the role of an effect in the 1-1-1 condition. Weight on the “isolated” Zs is the average of the two causally unrelated Z features in the 1-1-1 condition (see Figure 5). Figure 7. The size of the causal status effect (measured as the ratings difference between the missing-Z and missing-X test items) in Rehder (2009b) as a function of experimental condition. Figure 8. Causal networks tested in Rehder and Hastie (2001). (A) Common cause network. (B) Common effect network. Figure 9. Classification results from Rehder and Hastie (2001). (A) Common cause condition. (B) Common effect condition. Figure 10. Causal networks tested in Rehder and Kim (2006), Experiment 2. (A) 3-1-1 condition. (B) 1-11 condition. Figure 11. Classification results from Rehder and Kim (2006), Experiment 2. Weight on the “cause” Xs is averaged all three X features in the 3-1-1 condition and is the single X feature that plays the role of a cause in the 1-1-1 condition. Weight on the “isolated” Xs is the average of the two causally unrelated X features in the 1-1-1 condition (see Figure 10). Figure 12. Classification ratings from Rehder and Kim (2009b) for test items collapsed according to their number of typical features. (A) Experiment 1. (B) Experment 2. Figure 13. Classification ratings from Rehder (2003a). (A) Feature weights. (B) Interaction weights. (C) Log classification ratings. Unlike previous figures depicting interaction weights, panel B presents the average regression weights on the three-way interactions involving (a) X and two of its effects in the common cause condition and (b) Y and two of its causes in the common effect condition. Figure 14. Classification ratings from Rehder and Kim (2008). Figure 15. Causal category structures from Rehder and Kim (2009a). (A) Experiment 1. (B) Experiment 87 3. (C) Experiment 4. (D) Experiment 5. Figure 16. Causal category structures. (A) Follow up to Rehder & Kim (2009a). (B) From Chaigneau, Barsalou and Sloman (2004). 88 89 90 91 92 93 12 10 Weight 8 1-1-1 6 1-1-3 4 2 0 X Y Zs (Effects) Feature Zs (Isolated) 94 95 96 97 98 10 3-1-1 Weight 8 6 1-1-1 4 2 Xs (Isolated) Xs (Causes) Y Feature Z 99 100 101 102 103