Social Cognition and Connectionism 1 Connectionist Exploration in Social Cognition Frank Van Overwalle Vrije Universiteit Brussel, Belgium Christophe Labiouse Belgian NFSR Research Fellow & University of Liège, Belgium Robert French Université de Liège, Belgium This research was supported by Grant OZR423 of the Vrije Universiteit Brussel to Frank Van Overwalle and Grant HPRN-CT-2000-00065 of the European Commission to Robert French. Address for correspondence: Frank Van Overwalle, Department of Psychology, Vrije Universiteit Brussel, Pleinlaan 2, B - 1050 Brussel, Belgium; or by e-mail: Frank.VanOverwalle@vub.ac.be. Running Head: Connectionism and Social Cognition [PUBSOCO] 5 February, 2016 Social Cognition and Connectionism 2 Connectionist Exploration in Social Cognition Abstract Major findings in social cognition are reviewed and modeled from a connectionist perspective. These findings are in the areas of categorization and base-rate neglect, impression formation, primacy and recency in impression formation, assimilation and contrast, increased recall for inconsistent information, discounting in causal attribution, attitude formation, central and heuristic processing, cognitive dissonance and the use of reasoning heuristics. The majority of these phenomena are illustrated with well-known experiments, and simulated with an autoassociative network architecture with linear activation update and delta learning algorithm for adjusting the connection weights. All of the phenomena considered were successfully reproduced in the simulations. Moreover, the proposed model is shown to be consistent with algebraic models of impression formation (Anderson, 1981), causal attribution (Cheng & Novick, 1992) and attitude formation (Ajzen, 1991). The discussion centers on how the particular simulation specifications may be used to develop novel hypotheses for testing the connectionist modeling approach and, more generally, for improving and unifying theorizing in the field of social cognition. Social Cognition and Connectionism 3 Connectionist modeling of theoretical and empirical data in social cognition has only emerged during the last decade. This new approach arose from a certain dissatisfaction with mainstream models and a growing concern for the limitations of these models. In particular, the field suffered from a lack of theoretical integration. Inspired by the ever-increasing success of connectionist models in cognitive psychology, a number of authors have turned to these models in an attempt to provide a unified framework for social psychology research. In 1993 Read and Marcus-Newhall wrote the first major article describing a connectionist model of causal reasoning. Later, Smith (1996) forcefully argued for the application and the development of connectionist ideas in social psychology. Researchers have since made substantial progress in developing connectionist models of diverse social psychological phenomena including person perception and group stereotyping (Kunda & Thagard, 1996; Smith & DeCoster, 1998; Labiouse & French, 2001), causal attribution (Van Overwalle, 1998, Read & Montoya, 1999), cognitive dissonance (Shultz & Lepper, 1996; Van Overwalle & Jordens, 2001), group impression formation and change (Kashima, Woolcock, & Kashima, 2000) and illusory correlation (Van Rooy & Van Overwalle, 2001c). However, there are still large domains of social psychology that remain untouched by this new approach. Further, insofar as each of the above articles focuses largely on a single domain of social psychology, the field is still waiting for an overarching theoretical perspective. In an attempt to provide an integrative account of the various fields within social psychology, we will examine a number of mainstream findings in social cognition and will analyze them from a common connectionist perspective. In the past, many findings in social cognition have been explained by appeals to what often appear to be rather ad-hoc hypotheses and theories. Moreover, various areas of the field, such as the “person perception”, the “impression formation”, and the “intergroup relations” traditions, have unfortunately developed largely independently of each other, despite the close conceptual connections of the topics. This has left the field with a fragmentary theoretical basis. Our connectionist approach is an attempt to integrate some of these theoretical areas into a more comprehensive whole. While we are aware that this is a major undertaking, given the great power and flexibility of connectionist networks, as well as previous successful attempts to model social data within this framework, we believe it is possible to use these models to take a modest step towards the goal of unification of the field. Social Cognition and Connectionism 4 Many mainstream processes and findings in social cognition can be explained within a connectionist framework and, in many cases, better than the statistical or associative memory models developed in the past. What are the main characteristics of the connectionist models that accomplish this? First, connectionist models exhibit emergent properties such as prototype extraction, pattern completion, generalization, constraint satisfaction, and graceful degradation. (All of these are extensively reviewed in Smith, 1996, and Rumelhart & McClelland, 1986). It is clear that these characteristics are potentially useful for any account of social cognitive phenomena. In addition, connectionist models assume that the development of internal representations and the processing of these representations are done in parallel by simple and highly interconnected units, contrary to traditional models where the processing is inherently sequential. As a result, these systems have no need for a central executive, which eliminates the requirement of previous theories of explicit (central) processing of relevant social information. Consequently, information can, in principle, be processed in an implicit and automatic manner without recourse to explicit conscious reasoning. This does not, of course, preclude people’s being aware of the outcome of these preconscious processes. Second, neural networks are not fixed models but are able to learn over time, usually by means of a simple learning algorithm that progressively modifies the strength of the connections between the units making up the network. The fact that most traditional models in social psychology are incapable of learning is a significant restriction. Interestingly, the ability to learn incrementally puts connectionist models in broad agreement with developmental and evolutionary pressures. Third, connectionist networks have a degree of neurological plausibility that is generally absent in previous statistical approaches to information integration and storage (e.g., Anderson, 1981; Cheng & Novick, 1992; Fishbein & Ajzen, 1975). While it is true that connectionist models are highly simplified versions of real neurological circuitry and processing, it is commonly assumed that they reveal a number of emergent processing properties that real human brains also exhibit. One of these emergent properties is the integration of long-term memory (i.e., connection weights), short-term memory (i.e., internal activation) and outside information (i.e., external activation). There is no clear separation between memory and processing as there is in traditional models. Even Social Cognition and Connectionism 5 if biological constraints are not strictly adhered to in connectionist models of social cognition (i.e., persuasion, prejudice, …), concerns of the biological implementation of social cognitive mechanisms have indeed started to emerge (Adolphs & Damasio, 2001; Allison, Puce & McCarthy, 2000; Ito & Cacioppo, 2001; Cacioppo, Berntson, Sheridan & McClintock, 2000; Phelps, O’Connor, Cunningham, Funayama, Gatenby, Gore & Banaji, 2000) and parallel the increasing attention paid to neurophysiological determinants of social behavior. Other emergent properties of the connectionist approach will be explained in more depth in the next section. This article is organized as follows: First, we will describe the proposed connectionist model in some detail, giving the precise architecture, the general learning algorithm and the specific details of how the model processes information. In addition, a number of other less well-known emergent properties of this type of network will be discussed. We will then present a series of simulations, using the same network architecture applied to a number of significantly different phenomena. These phenomena involve categorization (base-rate neglect), impression formation (primacy, recency and memory advantage for inconsistencies), assimilation and contrast (of traits and exemplars), causal attribution (covariation, discounting and augmentation), attitudes (formation and cognitive dissonance) and judgmental heuristics. We will also very briefly discuss related work on group judgments (illusory correlation, ingroup versus outgroup differences). Our review of empirical phenomena in the field is not meant to be exhaustive, but is rather designed to illustrate how connectionist principles can be used to shed light on the processes underlying social cognition. While the emphasis of the present article is on the use of a particular connectionist model to explain a wide variety of phenomena in social cognition, previous applications of connectionist modeling to social psychology (Smith & DeCoster, 1998; Read & Montoya, 1999; Van Overwalle, 1998) are also mentioned. In addition, we will perform a comparison of different models. Finally, we will discuss the limitations of the proposed connectionist approach and discuss areas where further theoretical developments are under way or are needed. Ultimately, what we would like to accomplish in this paper is to create a greater awareness that connectionist principles could potentially underlie diverse social cognitive phenomena. A Recurrent Model Throughout this paper, we will use the same basic network model - namely, the recurrent Social Cognition and Connectionism 6 auto-associator developed by McClelland and Rumelhart (1985). This model has already gained some familiarity among social psychologists studying person and group impression (Smith & DeCoster, 1998) and causal attribution (Read & Montoya, 1999). We decided to apply a single basic model to emphasize the theoretical similarities that underlie a great variety of processes in social cognition. In particular, we chose this model because it is capable of reproducing a wider range of phenomena than other connectionist models, like feedforward networks (see Read & Montoya, 1999) or constraint satisfaction models such as Thagard's ECHO (Van Overwalle, 1998). The auto-associative network can be distinguished from other connectionist models on the basis of its architecture (the elements of the model) and its learning algorithm (how information in processed in the model). We will discuss these points in turn. Architecture The generic architecture of an auto-associative network is illustrated in Figure 1A. Its most salient property is that all nodes are interconnected with all of the other nodes. Thus, all nodes send out and receive activation. Information Processing In a recurrent network, processing information takes place in two phases. During the first activation phase, each node in the network receives activation from external sources. Because the nodes are interconnected, this activation is spread throughout the network in proportion to the weights of the connections to the other nodes. The activation coming from the other nodes is called the internal input (for each node, it is calculated by summing all activations arriving at that node). This activation is further updated during a number of cycles through the network. Together with the external input, this internal input determines the final pattern of activation of the nodes, which reflects the short-term memory of the network. Typically, activations and weights have lower and upper bounds of –1 and +1. In the linear version of activation spreading in the auto-associator that we use here, the final activation at each cycle is the linear sum of the external and internal input. In non-linear versions used by other social researchers (Smith & DeCoster, 1998; Read & Montoya, 1999), the final activation is determined by a non-linear combination of external and internal input. During our simulations, however, we found that the linear version with a single internal updating cycle often Social Cognition and Connectionism 7 reproduced the observed data better. Therefore, we used the linear variant of the auto-associator for all the reported simulations. We will discuss later why the linear variant might have been more efficient. After the first activation phase, the recurrent model enters the second learning phase in which the short-term activations are consolidated in long-term weight changes of the connections. Basically, these weight changes are driven by the error between the internal input generated by the network and the external input received from outside sources. This error is reduced in proportion to the learning rate that determines how fast the network changes its weights (typically between .01 and .50). This error reducing mechanism is known as the delta algorithm (McClelland & Rumelhart, 1988). Thus, when the network overestimates the external input of a node, this means that this node received too much internal input from the other nodes through their connections. To adjust this, the delta algorithm decreases the weights of these connections. Conversely, when the network underestimates the external input, this means that it received too little internal input and the weights are increased. These weight changes allow the network to better approximate the external input. Thus, the delta algorithm strives to match the internal predictions of the network as closely as possible to the actual state of the external environment, and stores this information in the connection weights. Structural and Dynamic Connections The social phenomena that we analyze can be subdivided in two main classes: structural processes that focus on stable attributes of social actors and objects (categorization, impression formation, generalization, assimilation and contrast) and dynamic processes that focus on causal consequences (attributions and attitudes). Although these two processes may differ somewhat at the surface level, we believe that they are very similar in their logical structure in that they both reflect predictions from features to categories (e.g., from behaviors to trait categories and from causes to events). For structural processes, the prediction involves a category or attribute, for instance, the trait of a person or the stereotype of a group. For dynamic processes, the prediction involves the outcome of a cause or the behavior toward an attitude-object. To illustrate these two types of predictions, imagine that when meeting an unfamiliar individual, a perceiver may wonder to what Social Cognition and Connectionism 8 type of group the individual belongs (stereotyping or structural prediction) and what he or she might do next (behavioral outcome or dynamic prediction). Typically, the prediction goes from low-level features, exemplars, causes or attitude-objects to higher-level abstractions such as categories and outcomes. These predictions are graphically illustrated in Figure 1B—C for structural and dynamic relations. To visualize the direction of prediction, we have drawn the (low-level) features that serve as input at the bottom layer of the architecture, and the (high-level) predicted categories that serve as output at the top layer. We will consistently use this direction of representation in the illustrations. Basic Emergent Connectionist Principles Before moving on to the social phenomena of interest, it is essential to briefly discuss the basic principles or mechanisms that drive many of our simulations. These principles are the emergent properties of the delta-learning algorithm and include acquisition, competition, and diffusion. Some of these principles have already been documented in prior social connectionist work (Van Overwalle, 1998; Van Overwalle & Van Rooy, 1998; 2001a, 2001b). However, because they are essential for understanding our examples, we will describe these principles first and discuss their application for social cognition in more detail later during the simulations. Acquisition and Sample Size Effect The acquisition principle involves sample size effects that have been documented in many areas of social cognition. For instance, when receiving more supportive information, people tend to hold more extreme impressions about other persons (Anderson, 1967, 1981), make more extreme causal judgments (Baker, Berbier & Vallée-Tourangeau, 1989; Försterling, 1992; Shanks, 1985, 1987, 1995; Shanks, Lopez, Darby & Dickinson, 1996), make more polarized group decisions (Fiedler, 1996; Ebbesen & Bowers, 1974), endorse more firmly an hypothesis (Fiedler, Walther & Nickel, 1999), make more extreme predictions (Manis, Dovalina, Avis & Cardoze, 1980) and agree more with persuasive messages (Eagly & Chaiken, 1993). One of the most striking characteristics of connectionist models using the delta algorithm is that learning is modeled as a gradual on-line process of adjusting existing knowledge to novel information. This characteristic has already been exploited in the earlier associative learning models that preceded connectionism, such as the popular Rescorla-Wagner (1972) model of animal Social Cognition and Connectionism 9 conditioning and human contingency judgments. This model predicts that when a cue (i.e., conditioned stimulus) is followed by an effect (i.e., unconditioned stimulus), the organism integrates this information resulting in a stronger cueeffect association and more vigorous responding when the cue is present. In humans, this also results in stronger judgments of the causal influence of the cue (see Baker et al. 1989; Shanks, 1985, 1987, 1995; Shanks et al., 1996; Van Overwalle & Van Rooy, 2001a). Likewise, the delta algorithm predicts that the more information that is received on the joint presence of a feature and a category, the stronger their connection weight will become. This results in a pattern of increasing weights as more pieces of information are processed, or a sample size effect (see illustration in Figure 2A). In contrast, conventional probabilistic and statistical models of causality (e.g., Cheng & Novick, 1992; Försterling, 1989) and attitude formation (Ajzen, 1991) do not predict a gradual increase of judgments and remain silent with respect to sample size effects. How is on-line learning and sample size effect achieved in connectionist models? Given the assumption that the connection weights are initially set to zero (or any arbitrary low scale value), the effect is that in the beginning phases of learning, the connection weights are relatively modest and often inaccurate, and only grow more accurate (stronger or weaker, positive or negative) when more information is received. The reason for this incremental learning is that the error in the delta algorithm is only gradually minimized as regulated by the learning rate. Even when the covariation between a feature and a category is perfect, the learning rate dictates that the weights connecting the two will increase by only a small fraction. Thus, it takes multiple repetitions of the same information before a strong weight emerges. Figure 2A depicts a system with a learning rate of 0.20. This means that the error of underestimating a perfect correlation was corrected gradually by increasing the weights with 20% of the error. As can be seen, because feature A is always paired with a category (i.e., perfect correlation), its connection weight will gradually increase at each trial starting with 0.20 to reach eventually its maximum value of +1 after a number of trails. Several researchers have noted that given a sufficient number of trials, the delta learning algorithm converges to the same predictions as conventional probabilistic and statistical models (Chapman & Robbins, 1990; Van Overwalle, 1996; Sarle, 1994). Social Cognition and Connectionism 10 Competition and Discounting Another essential property of the delta algorithm is that it gives rise to competition between connections. This competition principle favors features or causes that are more predictive or diagnostic than others, which are disfavored. The term competition stems from the associative learning literature on animal conditioning and causality judgments mentioned earlier (Rescorla & Wagner, 1972; Shanks, 1995), and should not be confused with other usages in the connectionist literature such as competitive networks (McClelland & Rumelhart, 1988). A typical example of competition is discounting in causal attribution. When one cause acquires strong causal weight, perceivers tend to ignore alternative causes (Hansen & Hall, 1985; Kruglanski, Schwartz, Maides, & Hamel, 1978; Rosenfield & Stephan, 1977; Van Overwalle & Van Rooy, 1998; Wells & Ronis, 1982). Competition is a basic property of associative learning models like that of Rescorla and Wagner (1972), where it is known as blocking. In fact, one of the reasons of the wide popularity of the Rescorla-Wagner model is that it was among the first conditioning models that were able to predict this property. As noted by several researchers (Read & Montoya, 1999; Van Overwalle, 1998), the delta algorithm makes similar predictions. How does this property work in connectionist models? The basic mechanism behind competition is that the internal activation of an outcome node is determined by the sum of the activations received from all connecting causal nodes (see Figure 2B). As the connection of cause A is already relatively strong, it sends a great deal of activation to the outcome node. Any additional activation from an alternative cause B leads to over-activation of the outcome node and increased error, and therefore blocks any growth of connection strength of cause B. Diffusion and Memory for Inconsistent Information Still another property of the delta algorithm is that it is responsible for the weakening of connections when a single node is connected to many nodes that are only occasionally activated. In spreading activation models of memory, this property is known as the fan effect (Anderson, 1976), although its underlying mechanism is fundamentally different from the diffusion property. In the associative learning and connectionist literature, this is a novel property that — to our knowledge — was not detected or mentioned earlier. The diffusion property is introduced to explain lower Social Cognition and Connectionism 11 recall for consistent as compared to inconsistent information in impression formation (Hastie & Kumar, 1979) and illusory correlation (Hamilton, Dugal & Trollier, 1985). A typical example is a trait that implies many behaviors of which only a few are actually present at a given time. The more behaviors that imply the same trait, the weaker each of the traitbehavior associations become. From a connectionist view, the reason is that while only a few behaviors are activated (and their connections strengthened), all other possible trait-implying behaviors are absent and thus remain inactivated, leading to a reduction of their connection weight with the trait. How does this diffusion principle explain enhanced recall of inconsistent information? As illustrated in Figure 2C, compared to many consistent behaviors that imply trait T1, inconsistent behaviors that imply trait T2 are, by definition, smaller in number. Hence, there is less often inactivation and thus weakening of inconsistent connections (with T2) than of consistent connections (with T1). This unequal weakening or diffusion therefore leads to enhanced recall for inconsistent information. Overview of the Simulations Simulated Phenomena We applied the three emergent connectionist processing principles to a number of classic findings in the social cognition literature. For explanatory purposes, most often, we replicated a well-known experiment that illustrates a particular phenomenon, although we occasionally also simulated a theoretical prediction. Table 1 lists the topics of the simulations to be reported shortly, the relevant empirical study or theory that we attempted to replicate, as well the major underlying processing principle responsible for reproducing the data. Although not all relevant data in social cognition can be addressed in a single paper, we are confident that we have included some of the most relevant phenomena in the current literature. General Methodology We basically used the same methodology throughout the simulations. The particular conditions and trial orders of the focused experiments were reproduced as faithfully as possible, although sometimes minor changes were introduced to simplify things (e.g., fewer trials than in the actual experiments). When a random trial order was used, we ran the auto-associative network 50 Social Cognition and Connectionism 12 times with a different random order and averaged the results. All parameters of the auto-associative model, except the learning rate, were kept fixed for all simulations (Estr = Istr = Decay = 1, and internal cycles = 1, see McClelland & Rumelhart, 1988). We did not impose a common learning rate because of the different contexts, measures and procedures used in the experiments. Rather, we freely selected a learning rate value that provided the highest correlation with the observed data of each simulation, after examining all admissible parameter values (see Gluck & Bower, 1988; Nosofsky, Kruschke & McKinley, 1992). In most cases, the selected learning rate was quite robust. In other words, increasing or decreasing this parameter had little substantial effect on the simulations. Only in a few cases where the original learning rate was already high ( .28), increasing the rate further was problematic because the weights grew out of bound (e.g., far beyond +1). The technical details on the auto-associative model are given in Appendix A. At the end of each simulated experiment or experimental condition, test trials were run in which certain nodes of interest were turned on and the resulting activation in other nodes was recorded to evaluate our predictions or to compare with observed experimental data. This will be explained in more detail for each simulation. Except when otherwise noted, the obtained test activations were projected onto the observed data using linear regression (with a positive slope), to visually demonstrate the fit of the simulations to data. The reason for the use of this technique is that most often only the pattern of test activations is of interest, rather than the exact values. Structural Relationships Categorization Perhaps one of the most basic learning processes in social cognition is categorization, or the grouping of diverse information into meaningful concepts or categories that contain similar features (e.g., objects), functions (e.g., roles) or members (e.g., social groups). The categorization process promotes cognitive economy and organization, which enables us to go beyond the current information given and to plan our behavior and interaction with the external environment. In recent approaches to categorization, members of a (social) category are not defined by strict criteria of necessity or sufficiency, but rather by a degree of typicality or representativeness. The process by which typicality is derived is most often described in terms of either a prototype or Social Cognition and Connectionism 13 an exemplar approach. According to the prototype approach, learners abstract a central tendency of each category and then classify instances according to their similarity to the category's central prototype (e.g., Rosch, 1978). In contrast, no such average or ideal prototype is assumed in the exemplar approach where categorizing of an object depends on the similarity with memory traces of all instances in the category (Fiedler, 1996; Hintzmann, 1986; Medin & Shaffer, 1978; Nosofsky, 1986; Smith & Zárate, 1992). Simulation 1: Categorization How does a recurrent model simulate categorization? As we have seen, during learning, the delta algorithm changes the weights between the object's features and the category so that they better predict category membership. By this error-reducing process, the weights reflect a sort of average link between the features of a category, that is, all instances are effectively "superimposed" or abstracted into a prototype. Let us illustrate the connectionist properties of feature similarity and prototype abstraction with the network example shown in Figure 3. This network has four feature nodes and two category nodes. Imagine that we are on a visit in Brussels and that we want to know whether an inhabitant is Flemish or Walloon. Probably the best criterion is the language being used: either Dutch (Flemish) or French (Walloon). However, because we may not always be able to hear these people talk, there are other less perfect features we may rely on: A Fleming is often perceived as simple-tasted and less sophisticated, refined or cultured than a Walloon. Table 2 shows a simulated learning experience in which we perceive each of these features a number of times and are also told the correct category (e.g., by our host). As can be seen, the perfect features are always paired with their own category, whereas the imperfect features are often absent even when the person can be categorized as Flemish or Walloon. The results of this simulation are illustrated in the top panel of Figure 4. In addition, this figure also depicts predictions from probabilistic theory and empirical data from Gluck and Bower (1988, Experiment 1). In this experiment, subjects were given a medical diagnosis task in which they had to learn to diagnose one of two diseases (i.e., categories) on the basis of four symptoms (i.e., features). Our simulation is a simplified version of the learning trails given to the subjects in Gluck and Bower's experiment. As can be seen, in Gluck and Bower's (1988) data, there was a clear preference for the perfect category ("Dutch" in our example) that went above the 50% base- Social Cognition and Connectionism 14 rate, illustrating base-rate neglect. In the simulation, to measure the typicality of features with respect to a category, each feature is activated or primed (see bottom panel of Table 2). This activation is automatically spread upward to the category nodes, and the degree of category activation reflects the typicality of the feature for that category. For instance, speaking Dutch is strongly related to the Flemish category so that priming of this feature will strongly activate the Flemish category. Likewise, speaking French is strongly related to the Walloon category and priming of this feature will strongly activate the Walloon category. To measure the preference of one category over the other, we considered the difference between the resulting activation of the Flemish category node and the activation of the Walloon category node (see Table 2). To map these simulation results on the proportional data of Figure 4 (where 50 % reflects an equal preference for both categories), the simulated data were regressed on the observed data. The intercept in this simulation was held constant at .50. Hence, simulation results above .50 reflect a preference for the Flemish category, while simulation results below .50 reflect a preference for the Walloon category. Not fixing the intercept in this manner would conceal the relative preference for the Flemish or Walloon category. As one would expect, Figure 4 (top panel) reveals that a perfect Flemish feature (e.g., Dutch) gives rise to the highest activation in the Flemish category. Similarly, a perfect Walloon feature (e.g., French) gives rise to the highest activation in the French category (as indicated by the lowest score in Figure 4). Imperfect features show activations that lie between these two extremes, as they are ambivalent predictors of category belongingness. As can be seen, the simulations fit nicely with research findings from Gluck and Bower (1988, Experiment 1) and better than probabilistic predictions. The bottom panel of Figure 4 illustrates the prototype of each category. To measure the category prototype, the category node is primed (see bottom panel of Table 2). This activation automatically spreads downward to the relevant feature nodes, and the resulting activation pattern of the features reflects the prototype of the primed category. (These downward connections are not shown in Figure 3, but are roughly equivalent to the upward connections). Thus, for instance, to measure the prototypical Flemish features, the Flemish category is primed and the resulting activation of the features reflects the prototype. Social Cognition and Connectionism 15 As one might expect, Figure 4 (bottom panel) reveals that the prototype consists predominantly of the category's perfect feature and less so of its imperfect feature. Because in our simulation features of the other category were also present, they are also part of the prototype although to a much weaker degree. Thus, the prototype is quite flexible as it may include features that are relatively rare, although these features are clearly less relevant or typical of the prototype. Base-Rate Neglect in Categorization One of the reasons why we took the Brussels example is that the distribution of the social categories is unequal: There are many more Walloons than Flemish living in Brussels. In the original study of Gluck and Bower (1988), they had a similar imbalance between common and rare disease categories. Given such an unequal distribution, the connectionist approach makes an interesting prediction that is intuitively plausible, but difficult to explain by other approaches. This prediction is base-rate neglect. Perceivers often place more emphasis on the diagnosticity or similarity of features and neglect the normative probabilities of the features' occurrence in making categorical judgments (Gluck & Bower, 1988; Kruschke, 1996). For instance, although Table 2 shows that the probability that a Dutch-speaking Brussels inhabitant is Flemish is equal to the probability that he or she is Walloon (e.g., 30 cases in both categories), people tend to rely more on Dutch as an informative cue to make a Flemish categorization. Intuitively, this makes sense. Dutch is a better predictor of being Flemish, because this feature is quite often a good predictor of the Flemish category on its own. In contrast, Dutch is not a good predictor of being Walloon as it is always contaminated by the presence of other Walloon features. People's reliance on the most predictive feature regardless of normative probability has been explained in the past by the operation of the representativeness heuristic, or by people's use of similarity rather than probability to categorize. In the social domain, the preference for diagnosticity of information is revealed in trait inferences where people rely more on some types of behaviors than on others (Reeder & Brewer, 1979). For morality traits, people draw inferences more readily from negative behavior (e.g., lying) whereas for ability traits, people draw inferences more readily from positive (e.g., successful) behavior. This has been explained by the fact that immoral behaviors are more rare and unique and thus more informative for morality judgments, while high-ability behaviors are more unique and thus more informative for ability inferences. Social Cognition and Connectionism 16 A connectionist network can predict this base-rate neglect in a straightforward manner. Consider the results of the simulation in the top panel of Figure 4. As noted earlier, a score above .50 indicates a preference for the Flemish category, while a score below .50 indicates a preference for the Walloon category. Of particular interest is that the perfect feature “Dutch-speaking” exceeds the normative probabilistic prediction of .50. This reveals base-rate neglect. From a connectionist perspective, this is due to the competition principle. For a Flemish inhabitant, the Dutch-speaking feature is the best predictor available so that the other features are discounted. In contrast, because a Walloon most often possesses better predicting features than speaking Dutch, this feature must compete against these better predictors and is discounted. As a result, the connection weight of the Dutch feature is stronger with the Flemish category (.21) than with the Walloon category (.05), resulting in a substantial proportion above .50 in favor of the Flemish category. Limitations and Future Directions Although the present recurrent model is capable of explaining base-rate neglect, it is not able to account for the inverse base-rate effect (Medin & Edelson, 1988; Kruschke, 1996; Shanks, 1992). Whereas base-rate neglect reflect the tendency to select a rare category (e.g., Flemish) when tested with a single feature (e.g. speaking Dutch) for which the objective probably was equal for all categories, an inverse base-rate effect reflects the tendency to select a rare category when tested with a combination of conflicting features (e.g. speaking Dutch and French). An explanation for this phenomenon is that the combination of both perfect features is quite distinctive and rare, and so is more indicative of a rare category (Shanks, 1992). A number of authors have developed connectionist network models to account for this inverse base-rate effect. Gluck (1992) claimed that a distributed representation of the present recurrent network approach was capable of explaining inverse base-rates. However, our simulations showed that this claim is incorrect, as this network cannot explain all data. Other proposals are more effective and account for the inverse base-rate effect by increasing the attention given to uncommon features. Shanks (1992) developed a simple extension of the standard delta algorithm to give more attention to uncommon features and Kruschke (1996; Kruschke & Johansen, 1999) developed a network model that learns to attend more to features that distinguish them from the already learned (frequent) category. Social Cognition and Connectionism 17 Impression Formation In a social context, getting to know others often involves drawing inferences about characteristics and traits of individuals and groups. This process of impression formation, we will argue, obeys connectionist principles similar to those underlying categorization processes in general. In a typical impression formation experiment, participants receive a series of trait adjectives about a person and are requested to make overall trait or likability impressions (categorization) of that person (e.g., Asch, 1946; Anderson, 1981; Kashima & Kerekes, 1994). Sometimes the adjectives are close synonyms that imply one or more specific traits, sometimes they are very diverse and imply an overall likeability impression. Anderson (1981) argued that impressions of a person are abstracted from trait adjectives as if people average these adjectives, and proposed a weighted average model to explain person impression judgments. Although his claim was supported by an impressive amount of research, the model was criticized on the grounds that it seems unlikely that people would perform all the necessary weighting and averaging calculations in their mind to arrive at an impression, and many researchers abandoned Anderson's model for this reason. However, the connectionist metaphor used here can revive Anderson's model. The weighted averaging principle can easily be implemented by implicit and automatic connectionist processes based on the delta algorithm, without recourse to explicit arithmetical calculations (see Appendix B for an algebraic proof). We will illustrate how a recurrent network can model impression formation with two typical findings from person impression research. Simulation 2: On-line integration and Recency First, consider an experiment by Stewart (1965) in which adjectives describing a high trait (e.g., talkative) were followed by opposite (or low) trait adjectives (e.g., reticent). This experimental manipulation was modeled using a network architecture consisting of a person node and a task context node (which reflects instructions and other experimental context variables) connected to a trait node. The simulations start from the assumption that the trait implied by an adjective is already learned and recruited from semantic and social knowledge. Specifically, we assume here that adjectives associated with the trait are denoted by an activation value of +1 for Social Cognition and Connectionism 18 that trait, whereas adjectives associated with the opposite trait have an activation value of -1 (this is equivalent to Anderson's scale values). What is of interest here is how this trait-implying information is applied to build up an impression of a specific person, by changing the weights linking the person with the trait. Table 3 depicts a schematic list of the information given in Stewart's (1965) experiment, where some subjects received high trait information about a person in the first half of the experiment and low trait information in the second half, and other subjects received the reverse low-high order. When the person is described by a high trait, the connection weight is increased according to the acquisition principle of the delta algorithm. In contrast, when the person is described by a low trait, the weight is decreased according to the acquisition principle. After training, the person node in the network is primed and the resulting activation of the trait node indicates what trait the person conveys (see bottom panel of Table 3). The results of a recurrent simulation are shown in Figure 5. As can be seen, there is a close fit between the simulation and the data. Of particular interest is the crossover at the end of training as the last presented adjectives win over the earlier presented adjectives, in both data and simulations. This reflects a recency effect and suggests that the revision and adjustment of person impressions is an on-line acquisition process where novel information often "overwrites" older information previously stored in the connection weights. Simulation 3: Recency in Concurrent Judgments, Primacy at Final Judgments As a second example, consider research in which disconfirmatory information is given during a single specific position in a series of trials. By comparing the effect of disconfirmatory with confirmatory information at the same position in the trial series (denoted as serial position), one can estimate the weight each trait takes at a given position (Anderson, 1979; Anderson & Farkas, 1973; Busemeyer & Myung, 1988; Dreben, Fiske & Hastie, 1979; Kashima & Kerekes, 1994). Early disconfirmatory trait information might be important in crystallizing an impression (primacy effect), while late information might be influential because it sheds new light on traits presented earlier on (recency effect). Research uniformly suggests that when participants give their trait ratings continuously after each adjective is presented, then item weights are relatively equal in all but the last position, at which point they rise sharply. This reflects a recency effect. However, it is most important to note Social Cognition and Connectionism 19 that this recency effect attenuates when more trait information is given. Thus, when given only a few pieces of trait-implying information, disconfirmatory information has a stronger effect than when given more trait-implying information. It is as if increasing the amount of confirmatory information shields the perceiver from the disconfirmatory information. In order to simulate this result, we used the same recurrent architecture as before. A simulation of the experiment by Dreben, Fiske and Hastie (1979) is schematically listed in Table 4 for the case when four adjectives are given (the logic is similar for other frequencies). The simulation results are shown in the top panel of Figure 6, where the dotted line depicts the attenuation of recency. The recurrent network was clearly able to reproduce the predicted attenuation, although attenuation was somewhat less steep in the simulations than in the data of Dreben, Fiske and Hastie (1979). How did the recurrent model attain attenuation of recency? One possible interpretation suggested by an analysis of the simulation is that the person node in the recurrent network receives internal activation from the trait node (e.g., “When someone is talking that loud, it must be John”). Because the trait node becomes positively linked with the person node after confirmatory trials, it compensates for the disconfirmatory information, and it does so increasingly better with more trials. Stated differently, a robust impression as a consequence of earlier confirmatory information makes the perceiver more resistant to change his or her impression given one disconfirmatory item. This explanation differs from Anderson's reasoning based on a distinction between item-specific and abstract aspects of impression formation (Anderson & Farkas, 1973; Dreben et al., 1979). Similar recurrent simulations also fitted well with recent serial position data from Kerekes (1991: in Kashima and Kerekes, 1994). In contrast to the previous findings, trait weights show a typical primacy effect when impression judgments are given at the end of the series of trials rather than continuously (Anderson, 1979). This primacy effect can also be simulated by the same recurrent network, as shown in the bottom panel of Figure 6. Of specific interest is the much greater learning rate for this simulation, which suggests that primacy might be a consequence of building up a prediction of the trait very quickly in only a few trials, so that later information has little effect on the impression. This interpretation shares with Anderson's (1981) attention decrement hypothesis the idea that there is the most attention paid to and the most uptake of information during the earliest trials, thus allowing little impact of information presented later. This hypothesis was also incorporated in an Social Cognition and Connectionism 20 alternative connectionist model, termed the tensor product model, developed for impression formation by Kashima and Kerekes (1994). In sum, based on our connectionist simulations, we can explain the different effects of continuous and final judgments by differences in learning rate. This seems plausible, as information uptake and processing is probably less interrupted by final trait judgments than by continuous judgments, resulting in a faster learning rate for final judgments and hence primacy rather than recency. Limitations and Future Research Although Kashima and Kerekes (1994; see also Busemeyer & Myung, 1988) correctly pointed out that attenuation of recency cannot be simulated with a feedforward network, we have demonstrated here that it can easily be reproduced with a recurrent model. This contradicts the claim made by Kashima, Woolcock and Kashima (2000, p. 924) that this effect cannot be obtained with a recurrent model. Moreover, unlike the tensor product model proposed by Kashima and Kerekes (1994), our simulations do not require additional ad-hoc assumptions such as a changing context after each judgment, to obtain attenuation of recency. That our recurrent model can reproduce both recency and primacy effects is encouraging, but as long as we cannot verify which independent conditions actually determine both effects, the idea that both effects are driven by a different learning rate remains at best suggestive. The novel hypothesis that grew from the simulations is that people build more robust impressions of a person, either through a growing positive expectancy that shields them from disconfirming information (attenuation of recency) or by building an impression very quickly and disregarding subsequent information (primacy). However, we are not aware of any research that has explored this potential explanation in depth. Simulation 4: Higher recall for inconsistent information In the previous research paradigms, participants received trait adjectives and were instructed to form an impression about a person. This seems to reflect the manner in which we routinely communicate about others. However, when we learn about others from our own observations, we do not see traits but rather the behaviors that are associated with them. An intriguing finding given this type of learning is that inconsistent or unexpected behavioral information is often better Social Cognition and Connectionism 21 recalled than information that is consistent with the dominant trait expectation (for a review see Stangor & McMillan, 1992). Thus, we better recall a hooligan helping an older lady cross the street than a nurse performing the same act. Hastie (1980, Hastie & Kumar, 1979) reasoned that the inconsistent information requires an extra cognitive effort to explain and to make sense of the inconsistency, and is therefore elaborated more deeply. This leads to extra links between the inconsistent information and other locations in memory, and, thereby, to better recall. Hastie (1980) supported this interpretation by research indicating that inconsistent information leads to more causal elaborations of the behavioral sentences. However, these sentence elaborations were explicitly requested from the participants after the initial phase of impression formation was over. It is thus not clear whether they were generated spontaneously during initial encoding or only constructed after the request (cf., Nisbett & Wilson, 1977). Can connectionist principles account for the enhanced memory of inconsistent information without recourse to explicit elaborative processes? Yes, and to illustrate this, we simulated a wellknown experiment by Hamilton, Katz and Leirer (1980, Experiment 3). Participants read information concerning several fictional persons. For each person, they read a list of 10 consistent and 1 inconsistent behavioral descriptions about that person, after which they had to recall as many behavioral sentences as possible. Half of the participants were given the instruction to form an impression of the person, whereas the other half was given the instruction to memorize the behavioral information. Under impression formation instructions, participants were more likely to recall inconsistent items, whereas this difference disappeared under memory instructions. To understand enhanced memory for inconsistent behavioral information, consider a network architecture with a person node and a trait node, as in the previous simulations, as well as separate nodes for each behavioral sentence. Thus, categorical trait information implied by the behavior as well as the individual behavioral exemplars are represented in a sort of semi-distributed manner. Table 5 provides a simplified simulation of Hamilton et al.'s experiment with 4 consistent behaviors and 1 inconsistent behavior. To simulate impression formation, each behavior was activated together with the associated trait and the person node (see Table 5). As predicted by the diffusion principle, however, each time a behavior is not present but expected due to the presence of the trait, this weakens the trait- Social Cognition and Connectionism 22 behavior connection. Thus, the more behaviors confirm the expected trait, the less indicative each behavior becomes for that trait or person. This is especially true for consistent behaviors, which appear much more often not than with the trait. As a result, the behavioral links will be weaker for consistent as opposed to inconsistent behaviors. In contrast, in the memorizing condition, subjects are not motivated to form a unified trait impression of the person. We assumed that this would result in a much shallower encoding of person and trait information, which was simulated by setting the activation of these nodes to 0.10 instead of the typical 1. As a result, all links between the person or trait and the behaviors would reduce sharply. Figure 7 shows the results of the recurrent simulation. It was assumed that the person or the traits would serve as cue to recall the specific behavioral episodes (see bottom panel of Table 5). The simulations give very similar results when only the person or only the traits are primed to retrieve the behavioral information. As can be seen, the simulations replicated the basic finding that inconsistent information was better recalled than consistent information under impression formation instructions. However, under memorizing instructions, enhanced memory disappeared. It is important to note that the same simulation was able to replicate the well-known finding that recognition measures produce the opposite tendency to report more consistent information (Stangor & McMillan, 1992). This was accomplished by running the same simulation followed by a recognition test that was biased by searching only for behaviors congruent with the consistent trait (see bottom panel of Table 5). This reflects the idea that consistent traits guide recognition when the perceiver relies on guessing. However, if this bias was removed (by deleting ? for the "common" trait), inconsistent behaviors were better recognized than consistent behaviors in line with the improved recognition sensitivity measures reported by Stangor and McMillan (1992). Limitations and Future Research The simulation of higher recall of inconsistent behavioral information suggests that this effect may be due to relatively stronger direct links of unique behavioral information. Thus, the present connectionist account emphasizes the direct connections from a particular trait or person to behavioral exemplars, while Hastie (1980, see also Srull, 1981) argued that stronger associations between consistent and inconsistent behaviors after resolving the inconsistency were what produced this higher recall. Our simulation does not rule out other processes, such as deeper and more Social Cognition and Connectionism 23 elaborated processing (Hastie, 1980), that may contribute to the effect of better recall of incongruent information. But is this more elaborated processing necessary? Some support for the effortful generation of elaborations was demonstrated in studies that found decreased recall for inconsistent behaviors when mental resources were limited by reducing answering time, by making the task more complex, or by adding distracter tasks (Bargh & Thein, 1985; Hamilton, Driscoll & Worth, 1989; Macrae, Hewstone & Griffiths, 1993; Stangor & Duan, 1991). However, these results can be easily simulated with our connectionist network by simply assuming that load decreased the encoding of the behavioral episodes or even all information (e.g., with an activation of 0.10). This suggests that poorer encoding of information, rather than less inconsistency reduction and elaboration might have reduced recall of inconsistent information. Hence, there seems to be no need to postulate explicit elaborations to explain higher recall of inconsistent behavior. The present perspective is also consistent with other findings that report less enhanced recall for inconsistent information when an impression is formed for a non-meaningful group of individuals, by assuming a decreased activation of the person and trait nodes, based on the fact that perceivers are less willing to invest cognitive effort in encoding an overall impression (Srull et al., 1985, experiment 7), for behavioral items at the beginning of a list compared to the end of a list (Srull et al., 1985, experiments 5 & 6; Hastie & Kumar, 1979, experiment 3), when the number of inconsistent items increases, thus making them less unique and unexpected (Hastie & Kumar, 1979, experiment 3; Srull, 1981, experiments 1—3; Srull, Lichtenstein & Rothbart, 1985, experiment 3). Overall, it appears that the proposed model is broadly consistent with a relatively large spectrum of research findings. This suggests that the diffusion principle provides an interesting alternative hypothesis explaining increased recall for inconsistent information. Assimilation and Contrast in Person and Group Perception An important feature of recurrent models is their capacity to generalize. A trained network exposed to an incomplete pattern of information will fill in the missing information on the basis of the complete pattern learned previously. This generalization process can be seen as a type of Social Cognition and Connectionism 24 assimilation in that past experiences influence how we perceive and interpret novel information that is similar or closely related to it. For instance, when seeing a photo of Hitler, we might immediately complete this image with activated memories on his aggressive wars, mass annihilation of Jews and so on. There is abundant evidence showing that accessible knowledge like traits, stereotypes, moods, emotions and attitudes is likely to result in the generalization to unobserved features. In the next simulations, we will explore some applications of this capacity to generalize, as well the opposite capacity to generate contrast effects in person perception (Anderson & Cole, 1990; Smith & DeCoster, 1998). Simulation 5: Assimilation of Unobserved Attributes To demonstrate generalization in a recurrent network, imagine that the network learns that Hitler was a cruel German Nazi leader who was responsible for the mass annihilation of Jews. When the network is then tested with a Hitler probe and a few related attributes (e.g., German, nazi, cruel), would it use this knowledge to activate the missing feature of mass annihilation? Recall that activation in a recurrent network is determined not only by external input, but also by internal input coming from related nodes in proportion to their connection weights. This implies that although the missing annihilation node receives no external activation, it does receive internal activation through its links with the Hitler and other related nodes. Table 6 shows a schematic description of a simulation that combines several simulations by Smith and DeCoster (1998). In this simulation, we presented information on three individual exemplars such as Hitler, Goebbels, and Himmler, each defined by five features (labeled E1—E5, E6-E10, and E11-E15) as well as on their group (e.g., Nazi) that was characterized by three features (labeled G1—G3). To increase the realism of the simulation, like Smith and DeCoster, we represent features in a distributed manner, that is, each feature is represented by a set of micro-features (unlike our previous simulations in which a localist coding scheme was used with each feature being represented by a single node). Distributed representations are more realistic because we know that symbolic concepts are not represented by single neurons but rather by assemblies of neurons. Specifically, each feature was represented by 5 micro-features or nodes. For instance, Hitler was not represented by five nodes, but rather by a series of 25 nodes that reflected several micro-features of his physical appearance, character and so on. In addition, we also use random noise in the Social Cognition and Connectionism 25 presentation of background context and features to simulate the imperfect conditions of perception. Although these latter aspects appear in our simulation mainly for purposes of comparison, they are not essential in understanding the generalization process or in producing the results (e.g., the noise cancels out given enough simulation runs). After going through the learning history of Table 6, all but one feature of the individual exemplars or group were primed (see bottom panel of Table 6). Figure 8 depicts the resulting activation of the remaining feature (represented by five nodes). As can be seen, the internal activation of the other nodes in the network allows the network to reconstruct the missing information of the original learned pattern almost perfectly, for both the individual exemplar and for the group. This indicates that the recurrent network is capable of integrating and utilizing both individualized and schematic (i.e., group) information. Further research Smith and DeCoster (1998) demonstrated that a recurrent network can reproduce other very interesting phenomena of social cognition. Perhaps one of the most intriguing properties is the creation of new emergent attributes by combining parts of existing attributes (see Smith & DeCoster, 1998, simulation 3). Traditional theories of categorization assume that people use a single schema, stereotype or knowledge structure to make inferences about a target person or a group. Even if multiple schemas are relevant, each of them is independently activated and applied. However, people can combine many sources of knowledge in order to construct new emergent properties to describe subtypes or subgroups of people. For instance, a militant feminist who is also a bank teller may become subtyped as a feminist bank teller with specific idiosyncratic attributes (Smith & DeCoster, 1998; Asch & Zukier, 1984). Previous connectionist models like ECHO (Thagard, 1989) were unable to model this process. Simulation 6: Assimilation with Traits, Contrast with Exemplars The abundance of assimilation effects in social cognition research may generate the suggestion that filling in unobserved characteristics is the default or most natural process. Thus, when primed with “violent,” we judge a non-descript or ambiguous target person as more hostile, and when primed with “nice,” we judge that same target as less hostile. However, under some circumstances, the opposite effect may occur. Sometimes primed features may lead to contrast Social Cognition and Connectionism 26 rather than assimilation. For instance, when primed with the exemplar Gandhi, people may judge a target person as relatively more hostile, whereas primed with Hitler, they may judge the same target as relatively less hostile. Under these conditions, the exemplars Gandhi and Hitler serve as an anchor against which the target is judged, and so leads to contrast effects. In sum, contextually (or chronically) primed information may not only serve as an interpretation frame, but also as a comparison standard during impression formation. What produces assimilation or contrast? According to Stapel, Koomen and Van der Pligt (1997), trait concepts are more likely to serve to interpret an ambiguous person description (assimilation), because traits carry with them only conceptual meaning. On the other hand, exemplars -- if sufficiently extreme -- will be used as comparison standard (contrast) because both the exemplar and the target are persons that can be compared with each other. An experiment by Stapel et al. (1997) confirmed this proposition. Participants were asked to form an impression of an ambiguous friendly or hostile target person. Before they were exposed to the description of the target, they were primed with names of traits (e.g., violent or nice) or with names of extreme exemplars (e.g., Hitler or Gandhi). Assimilation was found in the trait priming condition, whereas contrast was found in the person priming condition. A recurrent network can simulate this combination of assimilation and contrast. As listed in Table 7, the network first builds up background knowledge about average persons (who are at times more or less violent or nice), extreme exemplars like Hitler and Gandhi, as well as about the relationships between traits (e.g., nice is the opposite of hostile and violent). The essential idea of the simulation is that during priming, the primed stimulus and the target description are temporarily activated together. This is represented by programming a single learning trial for each priming condition (see Table 7). Because testing a trait category involves connections from person to trait, there is competition between exemplars, but not between traits and exemplars. Hence, when a trait concept is primed, this leads to the usual assimilation of the trait impression through the acquisition principle. In contrast, when an exemplar such as Hitler is primed, competition arises between this exemplar and the target exemplar (which is so nondescript that it is assumed to be taken as an instance of an average person). This competition arises when both exemplars predict hostility and their summed activation overestimates the observed degree of Social Cognition and Connectionism 27 hostility. This error leads to a decrease in the connection weights between target and hostility, and results in a contrast effect. The full learning history of this simulation is listed in Table 7. Distributed coding and noise were used in the priming trials to implement the idea that slightly different instances of traits and exemplars were used in the priming and prior knowledge phases. As can be seen in Figure 9, the simulation replicated the empirical assimilation and contrast effects as reported by Stapel, Koomen and Van der Pligt (1997). Limitations and Future Research The exemplars that serve as a comparison standard need to be sufficiently extreme, because otherwise little overestimation would occur in the network, and thus little contrast. This prediction is supported by a recent study by Moskowitz and Skurnik (1999). In two experiments, they found that that moderate exemplars (e.g., Kissinger) lead to less contrast than extreme exemplars (e.g., Hitler). As one might expect from the recurrent network's generalization property, they also found that moderate trait primes lead to less assimilation than extreme primes. The present recurrent network was able to reproduce the findings of Moskowitz and Skurnik (1999). However, the present network is, at the moment, not capable of reproducing the effect of cognitive load on assimilation and contrast. Moskowitz and Skurnik (1999) showed that cognitive interference (i.e., increasing task load or interrupting the current task) minimized the effects of trait assimilation, but left the effects of exemplar contrasts relatively untouched. If we simulate decreased resources during priming by decreasing node activation, however, we would expect the opposite effect to occur. Future research is necessary to ensure that Moskowitz findings are robust, and if so, how task load can be implemented in a recurrent network so that it can approximate their findings. Causal Judgments and Attitudes In this section we discuss causal judgments and attitude formation from an attributional perspective. A first question is how causes, attitudes and effects are represented in a connectionist network. As with the recurrent networks of social judgment described earlier, we represent causes and attitude-objects as features, and outcomes or behaviors as categories. However, whereas noncausal features in social inference are rather passive descriptors or predictors of category Social Cognition and Connectionism 28 membership, intuitively, causes and attitude-objects have a more active role, in that they also tend to play a causal role about the outcomes they predict. For instance, an angry face does not only tell something about the person (social inference), but also warns the observer to defend him- or herself for possible attacks (causal inference). Likewise, an attitude-object like a toy may not only look attractive (trait inference), but may also increase approach behavior (causal inference). This difference between the descriptive nature of social inference and the more active role or power of causes and attitudes is not explicitly modeled in connectionist models, but is evident from the typical sort of categories which reflect social events (behaviors, outcomes) rather than social entities (traits, groups, family, etc.). Causal Attributions Recent research has demonstrated that there are many parallels between human models of causal attribution and animal conditioning models (for overviews see Allen, 1993; Shanks, 1995; Read & Montoya, 1999). To cite a few important parallels, one of the most popular models in animal learning, the Rescorla-Wagner (1972) model, is identical to the delta learning algorithm (implemented in a feedforward network), and it has also been shown that this model asymptotically converges to another popular model of human causality based on probabilistic principles (Cheng & Novick, 1992; see Chapman & Robbins, 1990; Van Overwalle, 1996). In a recent article, Read and Montoya (1999) successfully simulated a number of phenomena from the animal learning literature with a recurrent network (see their Table 2, p. 735). This simulation work demonstrated that a recurrent model can reproduce competition between alternative causes such as discounting (where one cause blocks the causal influence of an alternative cause), augmentation (where one inhibitory cause increases the influence of an alternative cause that facilitates the outcome), inhibition (where an alternative cause develops inhibitory effects that prevent the outcome from occurring), overshadowing (where the causal strength of two causes is less than that of a single cause in predicting the same outcome). Like many researchers in the social and animal learning domain, we apply the terms discounting and augmentation quite broadly to denote causal competition both during and after Social Cognition and Connectionism 29 causal learning, that is, during or after novel, causally relevant information is received and processed. Thus, competition may occur between information taken in at any time, either with novel information (or novel causes) or with earlier material reactivated from memory (or known causes). This differs from the position taken by other authors (Morris & Larrick, 1995; Read & Montoya, 1999) who reserve the terms discounting and augmentation exclusively for reasoning processes based on prior causal learning in the original sense of Kelley (1972). Because several authors (Van Overwalle, 1998; Van Overwalle & Van Rooy, 2001a, 2001b; Read & Montoya, 1999) have already provided many illustrations of connectionist modeling of causal attribution, we will present only a single simulation of this phenomenon. Simulation 7: Forward Discounting A common finding in animal and human literature is that when a particular cause has already explained an outcome, then any alternative cause is always discounted. This is called forward discounting. The idea of forward discounting is largely consistent with the anchoring explanation in social psychology, which assumes that people anchor on the first presented explanation or the first dominant explanation that comes to mind (e.g., the actor's disposition) and tend to ignore novel information implicating alternative explanations (Shaklee & Fischhoff, 1982; Gilbert & Malone, 1995). For instance, Van Overwalle, Drenth and Marsman (1999) found that spontaneous trait inferences were not moderated by covariation information when presented after a description of the actor's behavior, but only when presented before it. Thus, when known personality traits or other situational pressures provide a ready explanation for someone's behavior, people tend to disregard novel information about additional factors. In a series of experiments, Van Overwalle and Van Rooy (1998, 2001b) combined the process of forward discounting with sample size, that is, by increasing the sample size and thus the strength of a known cause, discounting of a novel cause was made stronger. Thus, they combined the emergent principles of acquisition and competing. Participants read stories in which several causes could explain an outcome. For instance, in one of the stories they were first told that Ann won several single tennis games, and then that Ann (now a known cause) also won several double games with Troy (a novel cause). As expected, given Ann's previous successes, the contribution of Troy was decreased or discounted. However, a crucial manipulation was how often Ann won her single games. When Ann Social Cognition and Connectionism 30 won her single games only once, she acquired little causal strength and discounting of Troy was much weaker than when Ann won several times, thus acquiring more causal strength. Thus, discounting of Troy was indirectly influenced by the weakening or strengthening of Ann. Table 8 shows the design of this experiment, and Figure 10 depicts the simulated and observed results. As can be seen, the recurrent network conforms nicely to the observed data. Note that current statistical models of causality (Cheng & Novick, 1992; Försterling, 1989) are unable to account for these results. Van Overwalle and Van Rooy (1998, 2001b) performed similar experiments involving augmentation, and found parallel results consistent with the combined predictions of sample size and competition. Limitations and Future Research Discounting can occur not only when the alternative explanation is a novel one as in the simulation, but also when competing causes are processed simultaneously. Thus, competition effects do not require a fixed sequence of processing of causal information, as assumed in phaselike models of dispositional attribution (e.g., Gilbert, 1989). This implication is in line with recent research suggesting that the weighting of competing person and environmental attributions "involves an iterative or even simultaneous evaluation of the various hypotheses before reaching a conclusion" (Trope & Gaunt, 2000, p. 353). However, what happens when competition arises after the competing causes have already gained causal strength? For instance, if Troy and Ann always won their double games, and now we learn that Ann alone wins all her singles, what do we think about Troy? This now involves backward revaluation. According to Dickinson and Burke (1996), backward revaluation depends on the relationship between the two causes. They found that when causes are positively related, then discounting will take place; when they are independent, there will be no discounting (see also Van Overwalle & Timmermans, 2000). These results cannot be simulated with the present standard recurrent network, but requires a modification so that absent causes that are expected (via prior compound presentation) receive a negative activation rather than the standard "filling up" of activation from related nodes (for more details see Van Overwalle and Timmermans, 2000, 2001; Graham, 1999). Social Cognition and Connectionism 31 Attitudes The most influential and popular model of attitude formation is the theory of reasoned action developed by Fishbein and Ajzen (1975) and later refined and relabeled as the theory of planned behavior (Ajzen, 1991; Ajzen & Madden, 1986). According to this model, an attitude is a function of the expectation or belief that the behavior will lead to a certain consequence or outcome (e.g., various means of transportation, like cars and buses pollute; bicycles don't), and the person's evaluation of these outcomes (e.g., pollution is bad). Multiplying the expectancy and value components associated with each outcome and summing up these products determines an attitude. The theory of planned behavior has received considerable empirical support in many studies (see Fishbein & Ajzen, 1975; Ajzen, 1991; Ajzen & Madden, 1986), although it has been found that other factors besides attitudes may exert an influence on behavior. A major criticism leveled against the theory, however, is its assumption that humans make rational decisions, and carefully elaborate and compare alternative behavioral options before they engage in a particular behavior. It seems unlikely that people engage in extensive processing of the pros and cons of specific behavioral alternatives for every opinion or attitude they have (Fazio, 1990). Although Ajzen and Fishbein (1980) acknowledged that people may simply reactivate and employ attitudes formed previously, they still assumed that these prior attitudes had been formed explicitly. Simulation 8: Attitude Formation We propose, however, that attitudes may also be developed implicitly. Recent research by Betsch et al. (2001) indicates that the encoding of value-charged stimuli is sufficient to prompt an on-line process by which values are implicitly summed and stored in memory. A process of implicit attitude formation, representation and retrieval in memory without deliberative processing can be modeled by a connectionist implementation of the theory of planned behavior. To illustrate, let us return to the above example. The first attitude component, the belief that one's choice will result in certain outcomes, can be represented as causal expectations linking the choice of transportation with likely outcomes such as how fast a car will be, how dry the trip will be, and how polluting. The likelihood of these outcomes is expressed in the weight of the Social Cognition and Connectionism 32 connections, acquired during prior experiences. Thus, the more often a particular consequence is observed, the stronger the weight becomes. Conversely, the less often a particular consequence is observed, the weaker this weight will be. The second attitude component, the value, can be represented by concurrent emotional or evaluative responses to these outcomes, such as, how much the person likes or dislikes being in a fast and polluting car, in a dry place, again acquired during prior experiences. Thus, in line with Ajzen (1991, p. 191), we assume that the outcomes linked with a behavior are "valued positively or negatively", and that they are further modified during actual experiences. In a connectionist network, the activation sent out by each means of transportation is multiplied by the weight of the connections associated with the outcomes, including the value node. We suggest that a person's attitude is reflected in the activation of this value node after the relevant attitude-object (i.e., means of transportation) was activated. This proposition is mathematically very similar to the multiplicative function in Ajzen's (1991) theory of planned behavior (i.e., where expectations are replaced by connection weights, and values are replaced by value node activations; see Appendix C for a formal proof). However, it does not require the less plausible assumption of deliberate weighting of all alternatives, as only the beliefs and evaluations that are accessible in memory at the time of judgment will determine the attitude. Note that outcomes other than the value (fast, dry and polluting) are not taken into account for measuring an attitude because they reflect cognitions related to the attitude-objects rather than evaluations. This is consistent with the dominant view in the attitude literature that takes attitudes primarily as evaluative responses. Table 9 depicts a recurrent simulation of this example. The likelihood of the outcomes is determined by the frequency that a causal factor co-occurs with an outcome, and the value is determined by the degree of satisfaction or dissatisfaction experienced during this outcome. Although we used extreme +1 and -1 evaluative values for simplicity, moderate values are also possible. In Figure 11, the simulated values are compared with predictions of the theory of planned behavior. As can be seen, the simulated and predicted data match almost perfectly. Simulation 9: Dual-Process Models of Persuasion The theory of planned behavior (Ajzen, 1991; Ajzen & Madden, 1986) assumes that people systematically scrutinize all relevant information for making an attitude judgment. Although this might be the preferred approach when forming an initial opinion about an important issue Social Cognition and Connectionism 33 (Gollwitzer, 1990), in many cases attitudes are created or changed in a more shallow or heuristic manner. This distinction has been captured in the heuristic—systematic model of Chaiken (1980, 1987; Chen & Chaiken, 1999) and the elaboration likelihood model (Petty & Cacioppo, 1986; Petty & Wegener, 1999). According to these dual-process models, systematic processing implies that people have formed or updated their attitudes by actively attending to and cognitively reflecting upon persuasive argumentation. In contrast, heuristic processing implies that people have formed or changed their attitudes by using heuristic cues that give rise, automatically, to stored decision rules such as "experts can be trusted", "majority opinion is correct", and "long messages are valid messages". Dual-process theories regard systematic processing of information as requiring more effort and cognitive capacity than heuristic processing. Hence, when motivation or capacity for systematic scrutiny of information is low, such as when the issue is of low personal relevance or when time is limited, people use heuristics like source credibility, other people's attitudes or the length and number of arguments. These two processing modes are not necessarily mutually exclusive. For instance, systematic and heuristic processing may co-occur when the arguments are too ambiguous to form an opinion by extensive processing alone, that is, heuristic cues may additionally help to form an opinion by biasing the selection of ambiguous information (Chen & Chaiken, 1999). Such an interaction between systematic and heuristic processing was investigated by Chaiken and Maheswaran (1994). They presented a message about a fictitious answering machine in which different features were described with varying importance. This information was ostensibly published either in a highly regarded magazine specialized in scientific testing of new products or in a promotional pamphlet prepared by sales personnel. The results revealed that with low task importance (i.e., respondents' opinion would have little bearing on the manufacturer's product distribution), source credibility was the only determinant of people's attitude. In contrast, with high task importance (i.e., respondents' opinion would count heavily), the quality of the machines' features was the only determining factor, except when the message was ambiguous and source credibility alone influenced the attitude. This study is important, because it demonstrates several predictions of dual-process models. It documents how heuristic cues can bias message arguments that are ambiguous (the biasing hypothesis; Chen & Chaiken, 1999), and how systematic Social Cognition and Connectionism 34 processing can overrule heuristic cues when the arguments are unambiguous (the attenuation hypothesis; Chen & Chaiken, 1999). We simulated the interactive nature of systematic and heuristic processes as investigated by Chaiken and Maheswaran (1994) with our recurrent model (see Table 10). According to Bohner, Ruder and Erb (1999), heuristic cues like source credibility may lead people to form expectations about message valence or strength. We assume that these expectations are driven by prior experiences of good and bad argumentation with the same or similar sources, or by communicated opinions about such experiences. As can be seen in the top panel of Table 10, this assumption of prior knowledge on quality of argumentation was incorporated by setting the value node to +1 (high credibility) or .10 (low credibility) during an initial prior learning phase. Next, we ran one of three message types, involving strong, weak and ambiguous arguments, which were simulated by different activation levels of the value node. That is, the strength and direction of the arguments was determined by setting the activation of the value node to either positive or zero. (No negative activation was used as weak arguments actually involved descriptions of available features for which other products were however superior). Table 10 depicts a simplified version of the actual design used by Chaiken and Maheswaran (1994). More importantly, given heuristic processing, in line with the basic assumptions of dual process models, we assumed that these arguments would not be encoded or elaborated sufficiently (i.e., we ran one trial only with all activation levels divided by 10). In contrast, given systematic processing we assumed that these arguments would be processed more extensively (i.e., two trials as shown in Table 10). The results depicted in Figure 12 reveal that our simulation reproduced the predicted pattern as observed by Chaiken and Maheswaran (1994). Thus, the simulation reproduced heuristic and systematic processing, as well the predicted interaction between both (i.e., biasing and attenuation effects; Chen & Chaiken, 1999). Siebler, Bohner and Weinerth (1998) proposed an alternative connectionist constraint satisfaction model (i.e., ECHO, Thagard, 1989) to account for the same data. However, this latter type of connectionist model suffers from shortcomings (e.g., no weight adjustments, no permanent attitude change) to be discussed later. Nevertheless, for the present simulation, it should be noted that small deviations in the learning rate destroyed the predicted interaction between the systematic and heuristic processing of the ambiguous message. Specifically, a higher learning rate caused the Social Cognition and Connectionism 35 novel arguments to overwrite all memory of the source's credibility, whereas a lower learning rate caused the source's credibility to be the only determinant of the attitude. This seems to suggest that the interaction between systematic and heuristic processing depends on a critical balance between source credibility and (rate of) systematic elaboration of argument quality, which is in line with the sparse reports on this interaction in the attitude literature (Chaiken, Liberman & Eagly, 1989, p. 233; but see Bohner, Moskowitz, Chaiken, 1995). Simulation 10: Cognitive Dissonance Sometimes, our attitudes are not so much driven by immediate evaluations of attitude objects, but rather by reactions to our own behaviors, especially when these behaviors go against our initial preferences. This phenomenon is captured in Festinger's (1957) theory of cognitive dissonance, which predicts that discrepant behavior generates dissonance or uneasiness that "will exert pressures in the direction of bringing the appropriate cognitive elements into correspondence" (p. 11). For instance, when induced to write an essay that runs counter to one's initial attitude (e.g., a student defending stricter exam criteria), an individual will tend to reduce dissonance by changing his or her attitude in the direction of the position taken in the essay. This tendency is stronger when alternative explanatory factors or justification such as high payment or social pressure, are absent. In contrast, when external demands (e.g., payment or pressure by the experimenter) provide sufficient justification for engaging in the dissonant behavior, then dissonance reduction does not occur (e.g., Linder, Cooper & Jones, 1976). Cooper and Fazio (1984) have proposed an attributional analysis of the process of cognitive dissonance reduction. They suggested that individuals attempt to understand and justify their discrepant behavior ("Why did I behave this way?"). When alternative causal explanations for the discrepant behavior are absent, then participants conclude that they must have liked writing the essay more than they initially thought, and this results in attitude change. Conversely, when sufficient external explanations are available, no dissonance is experienced and no attitude change will occur. Thus, for instance, more attitude change is expected given a low rather than high monetary reward. However, the reverse effect has been observed when individuals are forced to engage in discrepant behavior. Indeed, in this case, there is more attitude change with a high rather than low monetary reward (Linder, Cooper & Jones, 1976). To explain this opposite effect, Van Overwalle Social Cognition and Connectionism 36 and Jordens (2001) extended the attributional analysis by assuming that individuals try to understand not only their discrepant behavior, but also their concurrent feelings ("Why do I feel this way?"). In the case of high external constraints like strong pressure towards discrepant behavior and low payment, the experimental situation will be experienced as particularly unpleasant. According to Van Overwalle and Jordens (2001), these negative feelings will counteract and reduce the attitude discrepancy, as if a person concludes that although having done something wrong, he or she was already sufficiently punished for it by feeling very bad about it. We conducted a connectionist implementation of the original experiment by Linder et al. (1967), which is very similar to the simulation by Van Overwalle and Jordens (2001). The learning history is shown in Table 11. To simulate the idea that prior experiences are only roughly similar to the experimental manipulations, we used a distributed representation with noise added (see top panel). The experimental manipulation was implemented as a single trial, to reflect the assumption that attributional thoughts were raised at least once during the experiment (see middle panel). The attitude toward the essay was measured by priming the attitude-object (i.e., essay) and reading off the activation of both the behavioral and affective outcomes. Thus, an attitude is seen here not only as an affective response, but also as a behavioral approach—avoidance response. How is attitude change under induced choice simulated? Given that pressure from the experimenter is absent, only the attitude-object (the essay) and payment serve as potential causes in explaining discrepant behavior and concurrent feelings. A lowered reward is simulated by decreasing the activation of the payment node to .20. This results in compensatory augmentation (i.e., competition principle) of the connections from the essay node to the behavioral and affective outcomes, with as consequence an increased positive attitude toward the essay. How was the no-choice condition simulated? In addition to the influence of payment, the negative feelings arising from low payment combined with experimenter pressure drive the connection between the essay and evaluative outcome downward, resulting in a decreased attitude toward the essay. The results of this simulation are shown in Figure 13 and compared with empirical data by Linder et al. (1967). As can be seen, the fit between simulated and observed data was excellent. Limitations and further research The present simulations encompass a wide variety of models and data in the attitude Social Cognition and Connectionism 37 literature. We are just beginning to uncover the implications of connectionist modeling for this area in social cognition. However, from this initial sketch it appears that seemingly different modes of processing and types of persuasion and information may all be driven by the same underlying connectionist mechanisms. In addition, our analysis paints a somewhat different picture of heuristic cues in attitude formation, in particular, and in social cognition in general. It is to this discussion that we now turn. A Note on Judgmental Heuristics The mainstream theoretical approach to judgment in social psychology is that information processing is rarely exhaustive or guided by logical norms, but rather reveals a compromise between rationality and economy. In this approach, effortless judgments are typified by judgmental heuristics that enable individuals to make rapid and easy judgments by rules-of-thumb that require little explicit thinking but, overall, provide adequate responses most of the time. According to this view, the price of such rapid judgments can be observed in a series of biases. Rather than viewing heuristics and biases as exceptions to the rules of logical thinking, we would like to argue that they actually reflect how the brain — as a connectionist device — works. Take, for example, the heuristics assumed to influence judgments under uncertainty (Kahneman, Slovic & Tversky, 1982), or the heuristic rules of dual-process models of persuasion (Chaiken, 1980; Petty & Cacioppo, 1987). Heuristics under Uncertainty We will consider three major heuristics used in judgment making under uncertainty. These are: Availability. The availability heuristic reflects the finding that many judgments are biased by information about facts and arguments available in memory, either due to frequent (chronic) utilization in the past or to recent priming. This is exactly what a recurrent network would predict. Information that is recently primed or activated is spread to other related concepts, influencing judgments about them as we have seen in the earlier assimilation examples (Simulation 5). A dramatic demonstration of chronic accessibility was given by Smith and DeCoster (1998, Simulation 5). They demonstrated that people who used a particular concept frequently in the past might lose this information if it was "overwritten" by novel information, but this concept could be Social Cognition and Connectionism 38 quickly be recovered after a few presentations of the original information. Representativeness. The representativeness heuristic has been invoked to explain the finding that categorization is often guided by resemblance between concepts rather than by statistical base rates. As we have seen in the section on categorization, this is exactly what one would expect from a connectionist view. In Simulation 1, we demonstrated that a category is chosen on the basis of its most unique (diagnostic) feature, even if that feature has the same base rate as another, less unique feature. Anchoring. The anchoring and adjustment heuristic has been proposed to explain why judgments are often biased toward an initial anchor and has been taken as evidence that judgments are often made and adjusted on-line. Again, anchoring can be simply taken as a consequence of online or incremental connectionist learning. According to the delta learning algorithm, weight adjustments are often stronger initially because of the greater error in the network, while later adjustments become increasingly smaller because the error is reduced. It is interesting to ask why adjustments in later phases of learning are often insufficient to engender a change of opinion. For instance, why are situational constraints and pressures seldom taken into account when making dispositional inferences about an actor? The answer may be found in the backward revaluation hypothesis proposed by Dickinson and Burke (1996). As noted in the causal attribution section, this hypothesis posits that backward discounting of causal factors will occur only when there is a strong unique relationship between these causes. Because the relationship between an actor and a situation is seldom unique (e.g., many different actors may appear in the same situation), this revaluation hypothesis predicts that correction of personal judgments by situational information will often be insufficient (see also Van Overwalle & Timmermans, 2000, 2001). Heuristics in Persuasion Dual-process models of persuasion (Chaiken, 1987; Petty & Cacioppo, 1986) posit that people revert to heuristic processing when their motivation or their capacity to analyze message content in detail is minimal. Heuristic cues are characterized as salient, easily processed pieces of stimulus information that gives rise, automatically, to the activation of a stored decision rule (Chaiken, Duckworth & Darke, 1999). These heuristic rules are developed through past experiences and observations and include, for example, beliefs that "experts can be trusted", Social Cognition and Connectionism 39 "multiple arguments are stronger", "high consensus implies correctness", and "things I like are good". Heuristic processing involves automatic processing of these rules with little awareness of their occurrence and their impact on judgments. Although we agree that heuristic processing is automatic and often beyond awareness, we would argue that it does not necessarily involve the application of well-learned rules. We do not exclude this possibility, but we propose that connectionist principles provide a much more convincing and parsimonious account of the implicit nature of heuristic processes. To support this contention, let us review a number of these heuristic rules and see how connectionist processes can explain them. As we shall see, the proposed connectionist mechanisms differ markedly from the original hypotheses in the literature on how heuristics are "activated" and "applied" to influence attitude judgments (Chen & Chaiken, 1999). Expertise. This heuristic was already simulated in the previous section (Chaiken & Maheswaran, 1994; Petty, Cacioppo & Goldman, 1981). It was shown that expertise involves an expectation about argument quality and value, based on prior learning from the same or similar sources. During prior learning, the activation of the value node is high when the source is an expert (with a standard activation level of 1) or low when the source is not an expert (e.g., 0.10). Most importantly, we contend that the effect of knowledge resulting from prior learning on source quality is naturally integrated with novel information about an attitude-object through the principle of acquisition. Hence, this heuristic does not require activation of any explicit rule or belief. Consensus. This heuristic (Maheswaran & Chaiken, 1991) functions in very similar ways to the expertise heuristic. Consensus information involves an expectation about the positive or negative value of features (i.e., implemented by a positive or negative activation of the value node), based on prior learning from other sources. In a connectionist framework, this expectation or prior knowledge is naturally integrated with novel information, without any recourse to additional rules or beliefs. Length. Lengthy messages tend to contain more arguments (Petty & Cacioppo, 1984) or tend to repeat the same arguments in different words with more detail (Wood, Kallgren & Preisler, 1985). According to the principle of acquisition, greater sample size of arguments should result in stronger effects on attitude judgments. Thus, the more often an argument that an attitude-object possesses a particular feature is repeated, the stronger the connection will grow between the Social Cognition and Connectionism 40 attitude-object and this feature (and its associated value). Nevertheless, it is also possible that people are mislead by the sheer length of a message (i.e., by use of larger fonts), even if it does not include more persuasive arguments (Wood et al., 1985). This seems to suggest that superficial characteristics of a message can influence processing, rather than the arguments themselves. A connectionist network can account for this effect by assuming that such superficial characteristics of length are often correlated with actual differences in message length, and so may influence attitude indirectly. More generally, heuristic processing may sometimes be influenced by issue-irrelevant aspects of the information, and so reflect qualitative rather than merely quantitative variations in processing (Petty & Wegener, 1999). Mood. As is evident from the network architecture used to simulate attitude formation, in our view, mood is just another outcome component that determines attitudes together with other behavioral approach or avoidance outcomes (i.e., discrepant behaviors). In our simulations, we tended to equate mood with evaluation (i.e., value node), but admittedly, this might prove to be an oversimplification (e.g., Perugini & Conner, 2000), useful only as a first approximation of this issue. Nevertheless, we do believe that mood and affect in general are outcomes or pieces of information that determine one's attitude and other judgments, although we did not differentiate strongly between implicit mood priming (Bower, 1981) and explicit processing of mood as part of relevant information (Schwartz, 1990). We return to this issue of implicit and explicit information processing in the general discussion. Fit and Model Comparisons The simulations that we have reported all replicate the empirical data or theoretical predictions reasonably well. However, it is possible that this fit is due to some procedural choices of the simulations rather than conceptual validity. The aim of this section is to demonstrate that changes in these choices generally do not invalidate our simulations. To this end, we explore a number of issues, including the localist versus distributed encoding of concepts, and the specific recurrent network used. We will address each issue in turn. Distributed Coding The first issue is whether the nodes in the auto-associative architecture encode localist or distributed features. Localist features reflect “symbolic” pieces of information, that is, each node Social Cognition and Connectionism 41 represents a concrete concept. In contrast, in a distributed encoding, a concept is represented by a pattern of activation across an array of nodes, none of which reflect a symbolic concept but rather some sub-symbolic micro-feature of it (Thorpe, 1994). Although we most often used a localist encoding scheme to facilitate this introduction to the most important processing mechanisms underlying connectionism, we admit that localist encoding is far from realistic. Unlike distributed coding, it implies that each concept is stored in a single processing unit and, except for differing levels of activation, is always perceived in the same manner. Given the advantages of distributed coding, is it possible to replicate our localist simulations with a distributed representation? To address this question, we reran all localist simulations with a distributed encoding scheme much like the previous distributed simulations (see Table 12 for details). As can be seen, all distributed simulations attained a good fit to data and, in all cases, the pattern of results from the localist simulations was reproduced. This suggests that the underlying principles and mechanisms that we put forward as being responsible for the major simulation results can be obtained not only in the more contrived context of a localist encoding, but also in a more realistic context of a distributed encoding. Feedforward Model We claimed earlier that feedforward connections were responsible for replicating most of the phenomena of interest, with the exception of serial position in impression formation (Simulation 3) and generalization (Simulation 5). To substantiate this claim, we ran all simulations with a feedforward pattern associator (McClelland & Rumelhart, 1988) that consists only of feedforward connections. As can be seen in Table 12, for all simulations except those mentioned earlier, as predicted, a feedforward architecture did almost equally well as the original simulations. The only exception was the interaction between heuristic and central processing of attitude information (Simulation 9) that was less robust, as noted earlier. This suggests that for most phenomena in social cognition, the feedforward connections in the network were most crucial. Only for serial position, generalization and interaction of heuristic and central processing (simulations 3, 5 & 9), the other lateral or backward connections were also important for obtaining the predicted effects. Social Cognition and Connectionism 42 Non-linear Recurrent Model We also claimed earlier that a recurrent model with a linear updating activation algorithm and a single internal updating cycle (for collecting the internal activation from related nodes) was sufficient for reproducing the social phenomena of interest. This contrasts with other social researchers who used a non-linear activation updating algorithm and many more internal cycles (Smith & DeCoster, 1998; Read & Montoya, 1999). Are these model features necessary? To answer this question, we ran all our simulations with a non-linear activation algorithm and 10 internal cycles. As can be seen from Table 12, although the non-linear model yielded an adequate fit, most simulations did not improve substantially compared to the original simulations. This suggests that the present linear activation update algorithm with a single internal cycle is sufficient for simulating many phenomena in social cognition. This should not come as a surprise. In recurrent simulations of other issues, such as the formation of semantic concepts, multiple internal cycles were useful to perform "cleanup" in the network so that the weights between, for instance, a perceptual and conceptual level of representation were forced to eventually settle into representations that had preestablished conceptual meaning (e.g., Sitton, Mozer & Farah, 2000). Such a distinction between perceptual and conceptual levels was not made here, and, as a result, multiple internal cycles had no real function. The Tensor Product Model Kashima and his colleagues recently presented a tensor product model, an alternative connectionist model of person and group impression formation and change (Kashima & Kerekes, 1994; Kashima, Woolcock, Kashima, 2000). As noted earlier, contrary to their claims, the present recurrent network was able to successfully reproduce the phenomena of impression formation simulated with their model, including recency and primacy effects. A major difference with the present recurrent approach, however, is that the tensor product model uses a Hebbian learning algorithm. Even though this type of learning has the advantage of neurobiological plausibility, it has the significant disadvantage that it does not reproduce the competition property. Hence, social cognition phenomena explained by this property such as base-rate neglect, discounting, cognitive dissonance and so forth, can presumably not be simulated with this model, at least not without Social Cognition and Connectionism 43 additional assumptions. And indeed, to simulate, for instance, attenuation of recency in impression formation, this model requires the ad-hoc assumption of different context presentations before and after a judgment (Kashima & Kerekes, 1994). This assumption was not required with the present simulations. General Discussion In this article, we have presented an overview of a number of major findings in social cognition and have shown how they might be able to be accounted for within a connectionist framework. This connectionist perspective offers a novel view on how information could be encoded in the brain, how it might be structured and activated, and how it could be retrieved and used for social judgment. This view differs from earlier theories in social cognition which have relied on metaphors such as associative networks or constraint satisfaction networks with fixed weights (Kunda & Thagard, 1996; Read & Marcus-Newhall, 1993; Shultz & Lepper, 1996), phaselike integration of information (Gilbert, 1989; but see Trope & Gaunt, 2000) or a formulation in algebraic or probabilistic terms (Cheng & Novick, 1992, Anderson, 1981; Ajzen, 1991). The problem is that these various metaphors give a fragmentary account of social cognitive mechanisms. In contrast, the connectionist approach proposed in this paper, while it relies on the same general auto-associative architecture and processing algorithms, has been used in such a way as to be applicable to a wide-ranging number of phenomena in social cognition. Moreover, we have shown that this model provides an alternative interpretation of earlier algebraic models in social psychology (Cheng & Novick, 1992, Anderson, 1981; Ajzen, 1991). In addition, this model can also account for the learning of social knowledge structures. Hence, this approach could potentially be used to investigate the development among infants and children of the structures underlying social cognition. We have focused to a large extent on the model as a learning device, that is, as a mechanism for associating patterns that reflect social concepts by means of very elementary learning processes. One major advantage of a connectionist perspective is that complex social reasoning and learning can be accomplished by putting together an array of simple interconnected elements, which greatly enhance the network’s computational power, and by incrementally adjusting the weights of the connections with the delta learning algorithm. We have demonstrated that this learning algorithm Social Cognition and Connectionism 44 gives rise to a number of novel properties, among them the acquisition property which accounts for sample size effects, the competition property accounting for discounting and augmentation, and the diffusion principle accounting for higher recall for inconsistent information. These properties are able to explain most of our simulations of social judgment and behavior. In contrast, introductory textbooks on the auto-associator (e.g., Fausett, 1994; McClelland & Rumelhart, 1988) emphasize other capacities of the auto-associator including its content-addressable memory, its ability to do pattern completion (see also Simulation 5) and fault and noise tolerance. Implications What are the implications of the present work for social cognitive theories? The key contribution of this paper is that a wide range of social cognitive phenomena was simulated with the same overall network model, suggesting that these phenomena are based, at least during early processing, on the same fundamental information processing principles. Providing a common framework for these different phenomena will hopefully generate further research and extend to new areas of social psychology usually seen as too different to be brought under a single theoretical heading. In addition to the present model’s ability to account for empirical data, it can generate new hypotheses that can be tested in a classical experimental setting. We briefly discuss some potential questions that emerge from this model. Knowledge Acquisition To what extent is the learning history assumed in our simulations correct? What mechanisms and architectural considerations are necessary to preserve the network’s knowledge base? Perhaps, these answers can in part be answered by laboratory replications of the assumed learning histories that should reveal equivalences with the (prior) knowledge of participants. Heuristic versus Central Processing Our approach does not make a principled distinction between heuristic and central processing. Quite often, setting activation to a lower or higher (default) level made it possible to simulate this distinction, suggesting that heuristic and central processing is mainly a matter of shallow versus focused attention to information. This differential attention gives rise to a differential emphasis on, for example, prior information versus novel information, and may result in different judgments. In previous sections, we explained in detail how several reasoning heuristics Social Cognition and Connectionism 45 could be viewed from this connectionist perspective and these suggestions are immediately open to empirical tests. Automatic versus Conscious Reasoning As noted in the introduction, the present model does not draw a sharp distinction between automatic and conscious processing, implicit and explicit processing, or associative and symbolic information processing. While some may view this as a disadvantage of the model, recent research has revealed that this distinction is far from clear-cut, as unconscious intuitions and insights may underlie conscious decisions. To resolve this quandary, some researchers (e.g., Smith & DeCoster, 2000) have proposed a distinction between two processing modes: a slow-learning (connectionist) pattern-completion mode and a more effortful (symbolic) mode that involves explicit symbolically represented rules and inferences. The present approach seems to suggest that such sharp distinction is perhaps not necessary. The Role of Affect In the final section on attitudes, the role of affect and evaluation was prominent in our simulations. As noted above, our model makes no distinction between affect as unconscious priming or explicit information (Schwartz, 1990), although it is clear that this distinction is crucial to understanding people’s reaction to mood changes. Assuming that evaluation and affect play a crucial role in attitude judgment, simply inducing a positive or negative mood unobtrusively will change these judgments in the direction predicted by the model. Recent research findings seem to support these predictions (Jordens & Van Overwalle, 2001). Limitations and Future Directions Given the breadth of social cognition, we inevitably were not able to include many other interesting findings and phenomena. Perhaps the most interesting area omitted involves group processes. Connectionist modeling may well help to explain how group identity is created, how perceptions of group homogeneity is changed, how accentuation of correlated features is enhanced, how illusory correlation and unrealistic negative stereotypes of minority groups are developed. These questions are addressed in Van Rooy and Van Overwalle (2001c). However, other phenomena, such as motivation, love, and violence, remain far beyond the current scope of connectionist modeling. Social Cognition and Connectionism 46 While we believe we have shown that a connectionist framework can potentially provide a parsimonious account of a number of disparate phenomena in social psychology, we are not suggesting that this is the only valid means of modeling social cognitive phenomena. On the contrary, we defend a multiple-view position in which connectionism would play a key role but would co-exist alongside other viewpoints. We think that a strict neurological reductionism is untenable, especially in personality and social psychology, where it is difficult to see how one could develop a connectionist model of such high-level abstract concepts as “need for closure”, “prejudice”, and the like. There are other limitations to connectionist models. Researchers who may agree with our overall auto-associative approach, may remain unpersuaded by a specific application of the model to a particular phenomenon. These applications merely reflect our current thinking and will almost certainly be replaced by improved models in the future. We believe, however, that the essence of the approach proposed here will survive. Our model suggests a number of possible directions for further investigation. Even though the simple auto-associative model presented here does, indeed, apply to a wide-range of social phenomena, it would be ridiculous to assume that the whole of higher cognition could be modeled by auto-association alone. This simple paradigm quickly reveals its limits when we try to apply the results obtained to other mechanisms. First, given the importance of attention and motivation in social perception and cognition, it will ultimately be necessary to incorporate these factors into an improved model. For the time being, attentional aspects of human information processing are not part of the dynamics of our network (variations were simply hand-coded as differences in activation states), which focuses almost exclusively on learning and pattern association. Another issue that remains to be resolved is how concepts are initially represented when presented to the network. This was not modeled in the present simulations, but is certainly critical for context-dependent learning and judgment, which involve combinations of features and context. Second, another improvement to the present recurrent network might be the inclusion of an array of hidden (McClelland & Rumelhart, p. 121—126) or exemplar nodes (e.g., Kruschke & Johansen, 1999) that may potentially increase its power and capacity, for instance, to process nonlinear interactions. Although non-linearity was not an issue in the present simulations, it may Social Cognition and Connectionism 47 become more critical for combinations of features (e.g., when only a combination of causes produces an effect), or of features and context. Third, a more modular architecture will almost certainly be necessary to produce a better fit of the model to empirical data. For example, one severe limitation of most connectionist models is known as “catastrophic interference” (McCloskey & Cohen, 1989; Ratcliff, 1990), which is the tendency of neural networks to forget abruptly and completely previously learned information in the presence of new input. This limitation is untenable for a realistic model of social cognitive processes, in general, and for a model of the formation and use of stereotypes, in particular, since one of the basic properties of stereotypes is their resistance to change in the presence of new information. In response to such observations, it has been suggested that, to overcome this problem, the brain developed a dual hippocampal-neocortical memory system in which new information is processed in the hippocampus and old information is stored and consolidated in the neocortex (McClelland, McNaughton, & O’Reilly, 1995; Smith & DeCoster, 2000). Various modelers (French, 1997; Ans & Rousset, 1997) have proposed modular connectionist architectures mimicking this dual-memory system with one sub-system dedicated to the rapid learning of unexpected and novel information and the building of episodic memory traces and the other subsystem responsible for slow incremental learning of statistical regularities of the environment and gradual consolidation of information learned in the first subsystem. There is considerable evidence for the modular nature of the brain, in particular for the complementary learning roles of hippocampal and neocortical structures (McClelland, McNaughton & O’Reilly, 1995), the predominant role of the amygdala in social judgment and perception of emotions (Adolphs, Tranel & Damasio, 1998), and so forth. It strikes us that the next step in connectionist modeling of social cognition will involve exploring connectionist architectures built from separate but complementary systems. Conclusion Connectionist modeling of social cognition fits seamlessly into a multilevel integrative analyses of human behavior (Cacioppo & al., 2000). Given that cognition is intrinsically social, connectionism will ultimately have to begin to incorporate social constraints into its models. On the other hand, social psychology will need to be more attentive to the biological underpinnings of social behavior. Social and biological approaches to cognition can therefore be seen as Social Cognition and Connectionism 48 complementary endeavors with the common goal of achieving a clearer and deeper understanding of human behavior. We hope that connectionist accounts of social cognition will provide the common ground for this exploration. Social Cognition and Connectionism 49 Appendix A. The Linear Auto-Associative Model In an auto-associative network, features and categories, or causes and outcomes are represented in nodes that are all interconnected. Processing information in this model takes place in two phases. In the first phase, the activation of the nodes is computed, and in the second phase, the weights of the connections are updated (see also McClelland & Rumelhart, 1988). Node Activation During the first phase of information processing, each node in the network receives activation from external sources. Because the nodes are all interconnected, this activation is then spread throughout the network where it influences all other nodes. The activation coming from the other nodes is called the internal input. Together with the external input, this internal input determines the final pattern of activation of the nodes, which reflects the short-term memory of the network. In mathematical terms, every node i in the network receives external input, termed exti. In the auto-associative model, every node i also receives internal input inti which is the sum of the activation from the other nodes j (denoted by aj) in proportion to the weight of their connection, or inti = (aj * wij), (1) for all j i. Typically, activations and weights range between –1 to +1. The external input and internal input are then summed to the net input, or neti = E * exti + I * inti, (2) where E and I reflect the degree to which the net input is determined by the external and internal input respectively. Typically, in a recurrent network, the activation of each node i is updated during a number of cycles until it eventually converges to a stable pattern that reflects the network's short-term memory. According to the linear activation algorithm, the updating of activation is governed by the following equation: ai = neti - D * ai, (3) where D reflects a memory decay term. In the present simulations, we used only one internal updating cycle and the parameter values D = I = E = 1. Given these simplifying assumptions, the final activation of node i reduces simply to the sum of the external and internal Social Cognition and Connectionism 50 input, or: ai = neti = exti + inti (3') Weight Updating After this first phase, the auto-associative model then enters in its second learning phase, where the short-term activation is consolidated in long term weight changes to better represent and anticipate future external input. Basically, weight changes are driven by the discrepancy between the internal input from the last but one updating cycle of the network and the external input received from outside sources, formally expressed in the delta algorithm (McClelland & Rumelhart, 1988, p. 166): wij = (exti - inti)aj, (4) where wij is the weight of the connection from node j to i, and is a learning rate that determines how fast the network learns. The presence of a feature or a category was typically encoded by setting the external input to +1, and -1 for opposite features or categories (lower values were also used, see appropriate tables); otherwise the external activation remained at resting level 0. The weights of the connections were updated after each trial. At the end of each simulation, the judgment of interest was tested by turning on the external input of the appropriate nodes and reading off the resulting activation of the nodes that represent the judgment of interest (see also appropriate tables). B. Anderson's Averaging Rule and the Delta algorithm This appendix demonstrates that the delta algorithm converges at asymptote to Anderson's (1981) averaging rule of impression formation, which expresses a rating about a person as: rating = isi / i, (5) were i represents the weights and si the scale values of the trait. This proof uses the same logic as Chapman and Robbins (1990) in their demonstration that the delta algorithm converges to the probabilistic expression of covariation. In line with the conventional representation of covariation information, person impression information can be represented in a contingency table with two cells. Cell a represents all cases where the actor is ascribed a focal trait, while cell b represents all cases where the actor is ascribed the opposite trait. For simplicity, I use only two trait categories, although this proof can easily be extended to more Social Cognition and Connectionism 51 categories. In a recurrent connectionist architecture with localist encoding as used in the text, the target person j and the trait categories i are each represented by a node, which are connected by adjustable weights wij. When the target person is present, its corresponding node receives external activation, and this activation is spread to each trait node. We assume that the overall activation received at the trait nodes i (or internal activation) after priming the person node, reflects the impression on the person. According to the delta algorithm in Equation 4, the weights wij are adjusted proportional to the error between the actual trait category (represented by its external activation ext) and the trait category as predicted by the network (represented by its internal activation int). If we substitute in Equation 4 ext by Anderson's scale values (s1 for the focal trait, and s2 for the opposite trait) and if we take the default activation for aj (which is 1), then the following equations can be constructed for the two cells in the contingency table: For the a cell: wi = (s1 - int), (6) For the b cell: wi = (s2 - int). (7) The change in overall impression is the sum of Equations 6 and 7, weighted for the corresponding frequencies a and b, in the two cells, or: wi = a[(s1 - int)] + b[(s2 - int)] (8) These adjustments will continue until asymptote, that is, until the error between actual and expected category is zero. This implies that at asymptote, the changes will become zero, or wi = 0. Consequently, Equation 8 becomes: 0 = a[(s1 - int)] + b[(s2 - int)] = a[s1 - int] + b[s2 - int] = [a * s1 + b * s2] – [a + b]int so that: int = [a * s1 + b * s2] / [a + b], Because the internal activation of the trait nodes reflects the trait impression on the person, this can be rewritten in Anderson's terms as: impression = fisi / fi (9) where f represents frequencies with which a person and the traits co-occur. As can be seen, Social Cognition and Connectionism 52 Equation 6 has the same format as Equation 1. This demonstrates that the delta algorithm predicts a weighted averaging function at asymptote for making overall impression judgments, where Anderson's weights are determined by the frequencies by which person and traits are presented together. C. Ajzen's Expectancy-Value model and the Delta algorithm This appendix demonstrates that the delta algorithm converges at asymptote to the expectancy-value model of attitude formation by Ajzen (1991). According to Ajzen's (1991) expectancy-value model, an attitude is formed by summing the multiplicative combination of (a) the strength of a salient belief that a behavior will produce a given outcome and (b) the subjective evaluation of this outcome, or (Ajzen, 1991, p. 191): attitude biei, (10) were bi represents the strength of the belief and ei the evaluation. Beliefs and evaluations are typically scored on 7- point scales. However, because there is "no rational a priori criterion we can use to decide how the belief and evaluation scales should be scored" (Ajzen, 1991, p. 193), the preceding formula can be normalized by diving it by the mean belief strengths, or: attitude biei / bi (11) Using the same logic of the proof above, it can be shown that the delta algorithm results in asymptote in an equation similar to Equation 9, or: attitude = fiei / fi (12) where f represent the frequencies that the attitude-object leads to a given outcome (which we assume determine the belief strengths b), and where e represents the activation values +1 (desirable outcomes), -1 (undesirable outcomes) or 0 (neutral). The equivalence between Equations 4 and 12 demonstrates that the delta algorithm predicts a (normalized) multiplicative function at asymptote for making attitude judgments, where the strength of the beliefs are determined by the frequencies by which the attitude-object and outcomes co-occur. Social Cognition and Connectionism 53 References Adolphs, R., & Damasio, A. (2001). The interaction of affect and cognition: A neurobiological perspective. In J.P. Forgas (Ed.). Handbook of affect and social cognition (pp. 27-49). Mahwah, NJ: Lawrence Erlbaum Associates. Adolphs, R., Tranel, D., & Damasio, A. (1998). The human amygdala in social judgment. Nature, 393, 470-474. Ajzen, I. & Madden, T.J. (1986). Prediction of goal-directed behavior: Attitudes, intentions, and perceived behavioral control. Journal of Experimental Social Psychology, 22, 453—474. Ajzen, I. (1991). The theory of planned behavior. Organizational Behavior and Human Decision Processes, 50, 179—211. Ajzen, I., & Fishbein, M. (1980). Understanding attitudes and predicting social behavior. Englewood Cliffs, NJ: Prentice-Hall. Allan, L. G. (1993). Human contingency Judgments: Rule based or associative? Psychological Bulletin, 114, 435-448. Allison, T., Puce, A., & McCarthy, G. (2000). Social perception from visual cues: role of the STS region. Trends in Cognitive Sciences, 4, 267-278. Anderson, J. R. (1976). Language, memory and thought. Hillsdale, NJ: Erlbaum. Anderson, N. H. (1967). Averaging model analysis of set size effect in impression formation. Journal of Experimental Psychology, 75, 158—165. Anderson, N. H. (1979). Serial position curves in impression formation. Journal of Experimental Psychology, 97, 8—12. Anderson, N. H. (1981). Foundations of information integration theory. New York: Academic Press. Anderson, N. H., & Farkas, A. J. (1973). New light on order effect in attitude change. Journal of Personality and Social Psychology, 28, 88—93. Anderson, S. M. & Cole, S. W. (1990). "Do I know you?": The role of significant other in general social perception. Journal of Personality and Social Psychology, 59, 384—399. Ans, B., & Rousset, S. (1997). Avoiding catastrophic forgetting by coupling two reverberating neural networks. Académie des Sciences de la vie, 320, 989-997. Asch, S. E. & Zukier, H. (1984). Thinking about persons. Journal of Personality and Social Psychology, 46, 1230—1240. Asch, S. E. (1946). Forming impressions of personality. Journal of Abnormal and Social Psychology, 41, 258—290. Baker, A. G., Berbier, M. W., & Vallée-Tourangeau, F. (1989). Judgments of a 2 x 2 contingency Social Cognition and Connectionism 54 table: sequential processing and the learning curve. The Quarterly Journal of Experimental Psychology, 41B, 65—97. Bargh, J. A. & Thein, R. D. (1985). Individual construct accessibility, person memory, and the recall-judgment link: The case of information overload. Journal of Personality and Social Psychology, 49, 1129—1146. Betsch, T., Plessner, H., Schwieren, C., & Gütig, R. (2001). Personality and Social Psychology Bulletin, 27, 242—253. Bohner, G., Moskowitz G. B., & Chaiken, S. (1995). The interplay between heuristic and systematic processing of social information. European Review of Social Psychology, 6, 33—68. Bohner, G., Ruder, M., & Erb, H.-P. (1999). When expertise backfires: Contrast versus assimilation in the interplay of heuristic and systematic processing. Unpublished manuscript. Bower, G. H. (1981) Emotional mood and memory. American Psychologist, 36, 129—148. Busemeyer, J. R. & Myung, I. J. (1988). A new method for investigating prototype learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14, 3—1. Cacioppo, J.T., Berntson, G.G., Sheridan, J.F., & McClintock M.K. (2000). Multilevel integrative analyses of human behavior: social neuroscience and the complementing nature of social and biological approaches. Psychological Bulletin, 126, 829-843. Chaiken, S. (1980). Heuristic versus systematic information processing and the use of source versus message cues in persuasion. Journal of Personality and Social Psychology, 39, 752—766. Chaiken, S. (1987). The heuristic model of persuasion. In M. P. Zanna, J. M. Olson, & C. P. Herman (Eds.). Social influence: The Ontario Symposium (Vol. 5., pp. 3—39). Hillsdale, NJ: Erlbaum. Chaiken, S., & Maheswaran, D. (1994). Heuristic processing can bias systematic processing: effects of source credibility, argument ambiguity, and task importance on attitude judgment. Journal of Personality and Social Psychology, 66, 460—473. Chaiken, S., Duckworth, K. L., & Darke, P. (1999). When parsimony fails… Psychological Inquiry, 10, 118—123. Chaiken, S., Liberman, A. & Eagly, A. H. (1989). Heuristic and systematic information processing within and beyond the persuasion context. In J. S. Uleman & J. A. Bargh (Eds.) Unintended thought (pp. 212—252). New York, NY: Guilford. Chapman, G. B. & Robbins, S. J. (1990). Cue interaction in human contingency judgment. Social Cognition and Connectionism 55 Memory and Cognition, 18, 537—545. Chen, S. & Chaiken, S. (1999). The Heuristic-systematic model in its broader context. In S. Chaiken & Y. Trope (Eds.). Dual-process theories in social psychology (pp. 73—96). New York, NY: Guilford Press. Cheng, P. W., & Novick, L. R. (1992). Covariation in natural causal induction. Psychological Review, 99, 365—382. Cooper, J. & Fazio, R. H. (1984). A new look at dissonance theory. In L. Berkowitz (Ed.). Advances in experimental social psychology (Vol. 17, pp. 229-266). New York: Academic Press. Dickinson, A. & Burke, J. (1996). Within-compound associations mediate the retrospective revaluation of causality judgments. Quarterly Journal of Experimental Psychology, 49B, 60-80. Dreben, E. K., Fiske, S. T., & Hastie, R. (1979). The independence of evaluative and item information: Impression and recall order effects in behavior-based impression formation. Journal of Personality and Social Psychology, 37, 1758-1768. Eagly, A. H. & Chaiken, S. (1993). The psychology of Attitudes. San Diego, CA: Harcourt Brace. Ebbesen, E. B., & Bowers, R. J. (1974) Proportion of risky to conservative arguments in a group discussion and choice shifts. Journal of Personality and Social Psychology, 29, 316—327. Fausett, L. (1994). Fundamentals of neural networks: architectures, algorithms and applications. Englewood Cliffs, NJ: Prentice-Hall. Fazio, R. H. (1990). Multiple processes by which attitudes guide behavior: the MODE model as an integrative framework. In M. P. Zanna (Ed.) Advances in Experimental Social Psychology (vol. 13, pp. 75—109). San Diego, CA: Academic Press. Festinger, L. (1957) A theory of cognitive dissonance. Evanston, IL: Row, Peterson. Fiedler, K. (1996). Explaining and simulating judgment biases as an aggregation phenomenon in probabilistic, multiple-cue environment. Journal of Personality and Social Psychology, 103, 193-214. Fiedler, K., Walther, E. & Nickel, S. (1999). The auto-verification of social hypotheses: Stereotyping and the power of sample size. Journal of Personality and Social Psychology, 77, 5-18. Fishbein, M., & Ajzen, I. (1975). Belief attitude, intention and behavior an introduction to theory and research. London, UK: Addison-Wesley. Försterling, F. (1989). Models of covariation and attribution: How do they relate to the analogy of analysis of variance? Journal of Personality and Social Psychology, 57, 615—625. Social Cognition and Connectionism 56 Försterling, F. (1992). The Kelley model as an analysis of variance analogy: How far can it be taken? Journal of Experimental Social Psychology, 28, 475—490. French, R. (1997). Pseudo-recurrent connectionist networks: An approach to the “sensitivity– stability” dilemma. Connection Science, 9, 353-379. Gilbert, D. T. & Malone, P. S. (1995). The correspondence bias. Psychological Bulletin, 117, 21— 38. Gilbert, D. T. (1989). Thinking lightly about others: Automatic components of the social inference process In J. S. Uleman & J. A. Bargh (Eds.) Unintended thoughts: Limits of awareness, intention, and control (pp. 189-211). New York: Guilford. Gluck, M. A. & Bower, G. H. (1988). From conditioning to category learning: An adaptive network model. Journal of Experimental Psychology: General, 117, 227—247. Gluck, M. A. (1992). Stimulus sampling and distributed representations in adaptive network theories of learning. In A. F. Healy, S.M. Kosslyn, R. M. Shiffrin (Eds.) From learning theory to connectionist theory: Essays in honor of William K. Estes (pp. 169—199). Hillsdale, NJ: Erlbaum. Gollwitzer, P. M. (1990) Action phases and mind-sets. In E. T. Higgins and R. M. Sorrentino (Eds.), Handbook of motivation and cognition: Foundations of social behavior (Vol. 2, pp. 53—92). New York: Guilford Press. Graham, S. (1999). Retrospective revaluation and inhibitory associations: Does perceptual learning modulate our perceptions of the contingencies between events? Quarterly Journal of Experimental Psychology, 52B, 159-185. Hamilton, D. L., Driscoll, D. M., & Worth, L. T. (1989). Cognitive organization of impressions: Effects of incongruency in complex representations. Journal of Personality and Social Psychology, 56, 925—939. Hamilton, D. L., Katz, L. B., Leier, V. O. (1980). Cognitive representation of personality impressions: Organizational processes in first impression formation. Journal of Personality and Social Psychology, 39, 1050—1063. Hamilton, D.L., Dugan, P.M., & Trollier, T.K. (1985). The formation of stereotypic beliefs: Further evidence for distinctiveness-based illusory correlation. Journal of Personality and Social Psychology, 48, 5-17. Hansen, R. D. & Hall, C. A. (1985). Discounting and augmenting facilitative and inhibitory forces: the winner takes all. Journal of Personality and Social Psychology, 49, 1482--1493. Hastie, R. & Kumar, P. A. (1979) Person memory: Personality traits as organizing principles in memory for behaviors. Journal of Personality and Social Psychology, 37, 25—38. Social Cognition and Connectionism 57 Hastie, R. (1980). Memory for behavioral information that confirms or contradicts a personality impression. In R. Hastie, T. M. Ostrom, E. B. Ebbesen, R. S. Wyer, D. L. Hamilton, & D. E. Carlston (Eds.). Person Memory: The cognitive basis of social perception (pp. 155— 177). Hillsdale, NJ: Erlbaum. Hintzmann, D. L. (1986). "Schema abstraction" in a multi-trace memory model. Psychological Review, 93, 411—428. Ito, T.A., & Cacioppo, J.T. (2001). Affect and attitudes: A social neuroscience approach. In J.P. Forgas (Ed.) Handbook of affect and social cognition (pp. 50-74). Mahwah, NJ: Lawrence Erlbaum Associates. Kahneman, D., Slovic, P., & Tversky, A. (1982). Judgments under uncertainty: Heuristics and biases. Cambridge, England: Cambridge University Press. Kashima, Y, & Kerekes, A. R. Z. (1994). A distributed memory model of averaging phenomena in person impression formation. Journal of Experimental Social Psychology, 30, 407—455. Kashima, Y., Woolcock, J., & Kashima, E. S. (2000). Group impression as dynamic configurations: The tensor product model of group impression formation and change. Psychological Review, 107, 914-942 Kelley, H. H. (1972). Causal schemata and the attribution process. In E. E. Jones, D. E. Kanouse, H. H. Kelley, R. E. Nisbett, S. Valins & B. Weiner (Eds.) Attribution: Perceiving the causes of behavior (pp. 151-174). Morristown, NJ: General Learning Press. Kruglanski, A. W., Schwartz, S. M., Maides, S., & Hamel, I. Z. (1978). Covariation, discounting, and augmentation : Towards a clarification of attributional principles. Journal of Personality, 76, 176–189. Kruschke, J. K. (1996). Base rates in category learning. Journal of Experimental Psychology: Learning, Memory and Cognition, 22, 3—26. Kruschke, J. K., & Johansen, M. K. (1999). A model of probabilistic category learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25, 1083—1119. Kunda, Z., & Thagard, P. (1996). Forming impressions from stereotypes, traits, and behaviors: A parallel-constraint-satisfaction theory. Psychological Review, 103, 284-308 Labiouse, C. L. & French, R. M. (2001). A connectionist model of person perception and stereotype formation. In R. French & J. Sougné (Eds.) Proceedings of the sixth Neural Computation and Psychology Workshop: Learning, Development, and Evolution, pp.209218. London: Springer Verlag. Linder, D.E., Cooper, J. & Jones, E.E. (1967). Decision freedom as a determinant of the role of incentive magnitude in attitude change. Journal of Personality and Social Psychology, 6, Social Cognition and Connectionism 58 245-254. Macrae, C. N., Hewstone, M., & Griffiths, R. J. (1993). Processing load and memory for stereotype-based information. European Journal of Social Psychology, 23, 77-87. Maheswaran, D. & Chaiken, S. (1991). Promoting systematic processing in low-motivation settings: Effect of incongruent information on processing and judgment. Journal of Personality and Social Psychology, 61, 13—25. Manis, M., Dovalina, I., Avis, N. E., & Cardoze, S. (1980). Base rates can affect individual predictions. Journal of Personality and Social Psychology, 38, 231—248. McClelland, J. L. & Rumelhart, D. E. (1985). Distributed memory and the representation of general and specific information. Journal of Experimental Psychology, 114, 159—188. McClelland, J. M. & Rumelhart, D. E. (1988). Explorations in parallel distributed processing: A handbook of models, programs and exercises. Cambridge, MA: Bradford. McClelland, J., McNaughton, B., & O’Reilly, R. (1995). Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and the failures of connectionist models of learning and memory. Psychological Review, 102, 419-457. McCloskey, M., & Cohen N.J. (1989). Catastrophic interference in connectionist networks: the sequential learning problem. The Psychology of Learning and Motivation, 24, 109-165. Medin, D. L. & Edelson, S. M. (1988). Problem structure and the used of base-rate information form experience. Journal of Experimental Psychology, General, 117, 68—85. Medin, D. L., & Schaffer, M. M. (1978). Context theory of classification learning. Psychological Review, 85, 207—238. Morris, M. W. & Larrick, R. P. (1995). When one cause casts doubt on another: A normative analysis of Discounting in causal attribution. Psychological Review, 102, 331—355. Moskowitz, G. B., & Skurnik, I. W. (1999). Contrast effects as determined by the type of prime: trait versus exemplar primes initiate processing strategies that differ in how accessible constructs are used. Journal of Personality and Social Psychology, 76, 911—927. Nisbett, R.E., & Wilson, T.D. (1977). Telling more than we can know: Verbal reports on mental processes. Psychological Review, 84, 231-259. Nosofsky, R. M., Kruschke, J. K. & McKinley, S. C. (1992). Combining exemplar-based category representations and connectionist learning rules. Journal of Experimental Psychology: Learning, Memory and Cognition, 18, 211—233. Nosofsky, R.M. (1986). Attention, similarity, and the identification-categorization relationship. Journal of Experimental Psychology: General, 115, 39-57. Perugini, M. & Conner, M. (2000). Predicting and understanding behavioral volitions: The Social Cognition and Connectionism 59 interplay between goals and behaviors. Journal of Experimental Social Psychology, 30, 705—731. Petty, E. & Wegener, D. T. (1999). The elaboration likelihood model: Current status and controversies. In S. Chaiken & Y. Trope (Eds.). Dual-process theories in social psychology (pp. 41—72). New York, NY: Guilford Press. Petty, R. E. & Cacioppo, J. T. (1986). The elaboration likelihood model of persuasion. In L. Berkowitz (Ed.). Advances in experimental social psychology (Vol. 19, pp. 123—205). San Diego, CA: Academic Press. Petty, R. E., Cacioppo, J. T. (1984). The effects of involvement on responses to argument quantity and quality: Central and peripheral routes to persuasion. Journal of Personality and Social Psychology, 46, 69—81. Petty, R.E., Cacioppo, J. T., & Goldman, R. (1981). Personal involvement as a determinant of argument-based persuasion. Journal of Personality and Social Psychology, 41, 847—855. Phelps, E.A., O’Connor, K.J., Cunningham, W.A., Funayama, S., Gatenby, C., Gore, J.C., & Banaji M.R. (2000). Performance on indirect measures of race evaluation predicts amygdala activation. Journal of Cognitive Neuroscience, 12, 729-738. Ratcliff, R. (1990). Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. Psychological Review, 97, 285-308. Read, S. J. & Marcus-Newhall, A. (1993) Explanatory coherence in social explanations: A parallel distributed processing account. Journal of Personality and Social Psychology, 65, 429—447. Read, S. J., & Montoya, J. A. (1999). An autoassociative model of causal reasoning and causal learning: Reply to Van Overwalle's critique of Read and Marcus-Newhall (1993). Journal of Personality and Social Psychology, 76, 728—742. Reeder, G. D., & Brewer, M. B. (1979). A schematic model of dispositional attribution in interpersonal perception. Psychological Review, 86, 61—79. Rescorla, R. A. & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (Eds.) Classical conditioning II: Current research and theory (pp. 64–98). New York: Appleton-Century-Crofts. Rosch, E. H. (1978) Principles of categorization. In E. H. Rosch & B. B. Lloyds (Eds.), Cognition and categorization (pp. 27—48). Hillsdale, NJ: Erlbaum. Rosenfield, D. & Stephan, W. G. (1977). When discounting fails: An unexpected finding. Memory and Cognition, 5, 97-102. Rumelhart, D. E., & McClelland, J. L. (1996). Parallel Distributed Processing: Explorations In The Social Cognition and Connectionism 60 Microstructure of Cognition - Vol. 1: Foundations. Cambridge, MS: The Mit Press. Sarle, W. S. (1994). Neural networks and statistical models. Proceedings of the nineteenth annual SAS users group international conference. Schwarz, N. (1990). Feelings as information: Informational and motivational functions of affective states. In E.T. Higgins & R. Sorrentino (Eds.), Handbook of motivation and cognition: Foundations of social behavior (Vol. 2). New York: Guilford. Shacklee, H. & Fischhoff, B. (1982). Strategies of information search in causal analysis. Memory and Cognition, 10, 520-530. Shanks, D. (1992). Connectionist accounts of the inverse base-rate effect in categorization. Connection Science, 4, 3—18. Shanks, D. R. (1985). Forward and backward blocking in human contingency judgment. Quarterly Journal of Experimental Psychology, 37b, 1—21. Shanks, D. R. (1987). Acquisition functions in contingency judgment. Learning and Motivation, 18, 147—166. Shanks, D. R. (1995). Is human learning rational? Quarterly Journal of Experimental Psychology, 48a, 257—279. Shanks, D.R., Lopez, F. J., Darby, R. J., Dickinson, A. (1996). Distinguishing associative and probabilistic contrast theories of human contingency judgment. In D. R. Shanks, K. J. Holyoak, & D. L. Medin (Eds.) The psychology of learning and motivation (Vol. 34, pp. 265—311). New York, NY: Academic Press. Shultz, T. & Lepper, M. (1996). Cognitive dissonance reduction as constraint satisfaction. Psychological Review, 2, 219-240. Siebler, F., Bohner, G. & Weinerth, T. (1998). Simulation of implicit and explicit processes in parallel-constraint-satisfaction networks. Unpublished manuscript. Sitton, M., Mozer, M. C., & Farah, M. J. (2000). Superadditive effects of multiple lesions in a connectionist architecture: Implications for the neuropsychology of optic aphasia. Psychological Review, 107, 709—734. Smith E. R., & Zárate, M. A. (1992). Exemplar-based model of social judgment. Psychological Review, 99, 3—21. Smith, E. R. & DeCoster, J. (1998). Knowledge acquisition, accessibility, and use in person perception and stereotyping: Simulation with a recurrent connectionist network. Journal of Personality and Social Psychology, 74, 21—35. Smith, E. R. (1996). What to connectionism and social psychology offer each other? Journal of Personality and Social Psychology, 70, 893-912. Social Cognition and Connectionism 61 Smith, E.R. & DeCoster, J. (2000). Associative and rule-based processing: A connectionist interpretation of dual-process models. In S. Chaiken & Y. Trope (Eds.) Dual-process theories in social psychology (pp. 323—338). London, UK: Guilford. Srull, T. K. (1981). Person Memory: Some tests of associative storage and retrieval models. Journal of Experimental Psychology: Human Learning and Memory, 7, 440—463. Srull, T. K., Lichtenstein, M., Rothbart, M. (1985). Associative storage and retrieval processes in person memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 11, 316—345. Stangor, C. & McMillan, D. (1992). Memory for expectancy-congruent and expectancy-incongruent information: A review of the social and social developmental literatures. Psychological Bulletin, 111, 42—61. Stangor, C., & Duan, C. (1991). Effects of multiple task demands upon memory for information about social groups. Journal of Experimental Social Psychology, 27, 357—378. Stapel, D. A., Koomen, W. & J. van der Pligt (1997). Categories of category accessibility: the impact of trait concept versus exemplar priming on person judgments. Journal of Experimental Social Psychology, 33, 47—76. Stewart, R. H. (1965). Effect of continuous responding on the order effect in personality impression formation. Journal of Personality and Social Psychology, 1, 161—165. Thagard, P. (1989). Explanatory coherence. Behavioral and Brain Sciences, 12, 435-467. Thorpe (1994). Localized versus distributed representations. In M. A. Arbib (Ed.) Handbook of brain theory and neural networks (pp. 949-952). Cambridge, MA: MIT Press. Tobena, A., Marks, I., & Dar, R. (1999). Advantages of bias and prejudice: an exploration of their neurocognitive templates. Neuroscience and Biobehavioral Reviews, 23, 1047-1058. Trope, Y. & Gaunt, R. (2000). Processing alternative explanations of behavior: Correction or integration? Journal of Personality and Social Psychology, 79, 344—354. Van Overwalle, F. & Timmermans, B. (2000) Discounting and augmentation in attribution: The role of connections between causes. Manuscript submitted for publication. Van Overwalle, F. & Timmermans, B. (2001). Learning about an Absent Cause: Discounting and Augmentation of Positively and Independently Related Causes. In R. French & J. Sougné (Eds.) Proceedings of the sixth Neural Computation and Psychology Workshop: Evolution, Learning, and Development. London: Springer Verlag. Van Overwalle, F. (1996). The relationship between the Rescorla-Wagner associative model and the probabilistic joint model of causality. Psychologica Belgica, 36, 171-192. Van Overwalle, F. (1998) Causal Explanation as Constraint Satisfaction: A Critique and a Social Cognition and Connectionism 62 Feedforward Connectionist Alternative. Journal of Personality and Social Psychology, 74, 312-328. Van Overwalle, F., & Jordens, K. (2001) A Feedforward Connectionist Model of Cognitive Dissonance: An Alternative to Shultz and Lepper (1996). Manuscript submitted for publication. Van Overwalle, F., & Van Rooy, D. (1998). A Connectionist Approach to Causal Attribution. In S. J. Read & L. C Miller (Eds.) Connectionist models of Social Reasoning and Social Behavior. New York: Erlbaum. Van Overwalle, F. & Van Rooy, D. (2001a). When more observations are better than less : A connectionist account of the acquisition of causal strength. European Journal of Social Psychology, 31, 155-175. Van Overwalle, F., & Van Rooy, D. (2001b). How one cause discounts or augments another: A connectionist account of causal competition. Personality and Social Psychology Bulletin, in press. Van Overwalle, F. & Van Rooy, D. (2001c). A recurrent connectionist model of biases in group judgments. Manuscript submitted for publication. Van Overwalle, F., Drenth, T. & Marsman, G. (1999). Spontaneous trait inferences: Are they linked to the actor or to the action? Personality and Social Psychology Bulletin, 25, 450462. Wells, G. L. & Ronis, D. L. (1982). Discounting and augmentation: Is there something special about the number of causes? Personality and Social Psychology Bulletin, 8, 566—572. Wood, W., Kallgren, C. A., Preisler, R. M. (1985). Access to attitude-relevant information in memory as a determinant of persuasion: The role of message attributes. Experimental Social Psychology, 21, 73—85. Journal of Social Cognition and Connectionism 63 Table 1 Overview of the Simulations Nr. Topic Evidence / Prediction Major Processing Principle 1 Categorization Gluck & Bower, 1988, exp. 1 Competition 2 Impression formation Stewart, 1965 Acquisition 3 Serial position Dreben, Fiske & Hastie, 1979 Acquisition 4 Inconsistent Hamilton, Driscoll, & Worth, 1980, exp. 3 Diffusion Smith & DeCoster, 1998, sim. 1 & 2 Spreading information 5 Generalization of Internal Activation 6 Assimilation & Stapel, Koomen & van der Pligt, exp. 3 Contrast 7 Causal Attribution Acquisition Competition Van Overwalle & Van Rooy, 2001, exp. 1 Acquisition Competition 8 Attitude Formation Ajzen, 1991 Acquisition 9 Dual-Process Models Chaiken & Maheswaran, 1994 Acquisition 10 Cognitive Dissonance Linder, Cooper & Jones, 1967 Competition Social Cognition and Connectionism 64 Table 2 Learning Experiences in Categorization (Simulation 1) Features ——————————————————————————————-———————Dutch less-sophisticated refined French Categories ———————————————Flemish Walloon Flemish (Rare) Category #10 #5 #5 #10 1 1 1 1 0 1 1 1 R 0 0 1 1 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 Walloon (Common) Category #30 #15 #15 #30 1 0 0 0 1 1 0 0 1 1 1 0 Test Features Dutch less-sophisticated refined French 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 ? ? ? ? –? –? –? –? ? ? ? ? ? ? ? ? 1 0 0 1 Prototype Flemish Walloon Note. Simplified version of the experimental design of Gluck & Bower (1988); Cell entries denote external activation; R=Randomized order; #=frequency of trial; the pattern of features were generated according to the following probabilities: Given a category, the category's own perfect feature was present 100% of the time and its imperfect feature 67%, the other category's imperfect feature 50% and its perfect feature 33%. Social Cognition and Connectionism 65 Table 3 Impression Formation: Recency after Reversal of Trait-implying Information (Simulation 2) Features —————————————————————— person context Category ——————————— trait High - Low Presentation Order #4 High trait #4 Low trait 1 1 1 1 +1 –1 Low - High presentation Order #4 Low trait #4 High trait 1 1 1 1 –1 +1 Test 1 0 ? Note. Schematic representation of the experimental design of Stewart (1965); High=adjective implies trait; Low=adjective implies opposite trait; Cell entries denote external activation; #=number of trials. Social Cognition and Connectionism 66 Table 4 Impression Formation: Recency and Primacy in Serial Position Weights (Simulation 3) Features —————————————————————— person context Category ——————————— trait Confirmatory Information #4 High 1 1 +1 Mixed Information #3 High #1 Lowa 1 1 1 1 +1 –1 Test 1 0 ? Note. Schematic representation of the experimental design of Dreben, Fiske & Hastie (1979) illustrated here for the four trial condition; High=adjective implies trait; Low=adjective implies opposite trait; Cell entries denote external activation; Initial weights were set at .10; #=number of trials. a This disconfirming trial is presented at position 1, 2, 3, or 4 of the four trial series (here it is shown at position 4), and ratings are then compared with judgments from the confirmatory condition at the same serial position. Social Cognition and Connectionism 67 Table 5 Impression Formation: Memory for inconsistent Information (Simulation 4) person #1 #1 #1 #1 #1 consistent consistent inconsistent consistent consistent 1a 1a 1a 1a 1a Trait —————————————— common violent 1a 1a 0 1a 1a 0 0 1a 0 0 Behavioral Exemplars ————————————————————————— consistent inconsistent 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 Test Recall consistent inconsistent 1 1 1 1 1 1 ? ? ? ? 0 0 0 0 0 ? Biased Recognition consistent inconsistent ? ? ? ? 0 0 1 1 1 1 0 0 0 0 0 1 Note. Schematic representation of the experimental design of Hamilton, Katz & Leirer (1980, experiment 3), illustrated for a fictitious 4 / 1 distribution of consistent versus inconsistent behaviors. In the original experiment, the distribution was 10 / 1, and the inconsistent statement was given at position 2, 6 or 10 of the list (in most similar experiments order was randomized). Cell entries denote external activation; #=number of trials. a Activation set to 0.10 under memorizing instructions. Social Cognition and Connectionism 68 Table 6 Assimilation: Exemplar and Group Inferences (Simulation 5) Exemplar Features ———————————————————————————— E1 E2 E3 E4 E5 Group Features ——————————————— G1 G2 G3 Background Knowledge R #100 context #20 exemplar 0̃ ̃ 0̃ ̃ 0̃ ̃ 0̃ ̃ 0̃ ̃ 0̃ ̃ 0̃ –̃ 0̃ +̃ 0 ? 0 0 + 0 – 0 ? Test exemplar group 0 0 0 Note. Schematic representation of assimilation of exemplar and group stereotypes; Each feature E or G is represented by 5 nodes; Cell entries denote external activation; for the exemplar features, reflects a randomly drawn Normal distributed pattern with M=0 & SD=.5 (identical across all trials); for the group features, + reflects M=.50 and – reflects M=-.5; R=Randomized order; #=number of trials. For reasons of clarity, the other exemplar features E6 to E15 representing two other exemplars (each #20 trials) are not shown. ~ Noise added randomly at each trial with Normal distribution of M=0 & SD=.5 Social Cognition and Connectionism 69 Table 7 Assimilation and Contrast (Simulation 6) Exemplars ——————————————————————— person Hitler Gandhi Trait Categories ———————————————— hostile/ violent nice Background Knowledge #10 #10 R #10 #10 #10 Hitler Gandhi average person average person traits 0 0 + + 0 + 0 0 0 0 0 + 0 0 0 + 0 + 0 – 0 + 0 + + +̃ 0 0 0 0 +̃̃ 0 0 0 0 +̃ 0 0 0 0 +̃̃ 0 0 0 ? Priming (each condition only once) #1 #1 #1 #1 Hitler Gandhi hostile nice +̃ +̃ +̃ +̃ Test + Note. Schematic representation of prior knowledge acquisition and experimental design of Stapel, Koomen & Van der Plight (1997, Experiment 3); Each feature/category is represented by 5 nodes; Cell entries denote external activation with + and – reflecting a randomly drawn Normal distributed pattern with M=+.5/-.5 & SD=.5 (identical across all trials; the simulation was run for 5 such activation patterns and results were averaged); R=Randomized order; #=number of trials. ~ Noise added randomly at each trial with Normal distribution of M=0 & SD=.5 Social Cognition and Connectionism 70 Table 8 Forward Discounting of a Novel Cause in function of the Sample Size of a Known Cause (Simulation 7) Causes —————————————————————— known novel (Ann) (Troy) Small Sample Size #1 #5 Large Sample Size #5 #5 Outcome ——————————— 1 1 0 1 1 1 1 1 0 1 1 1 Test Known (Ann) Novel (Troy) 1 0 0 1 ? ? Note. Schematic representation of the experimental design of Van Overwalle & Van Rooy (2001); Cell entries denote external activation; #=number of trials. Social Cognition and Connectionism 71 Table 9 Attitude Formation (Simulation 8) Causal Factors ———————————————— car bicycle bus Outcomes —————————————————————-—— fast dry pollutes value Car #10 R #10 #10 1 1 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 1 1 –1 #5 #10 0 0 Bicycle 1 0 1 0 1 0 0 –1 0 0 1 –1 #10 R #5 #5 0 0 0 0 0 0 1 1 1 –1 0 0 0 1 0 0 0 1 –1 1 –1 0 0 1 0 0 0 0 0 0 0 0 0 ? ? ? R Bus Test attitude toward car attitude toward bicycle attitude toward bus 1 0 0 0 1 0 Note. Schematic representation of attitude formation on the basis of beliefs on outcome consequences and value (likeability of consequences; cf., theory of Planned behavior); Cell entries denote external activation; R=Randomized order; #=number of trials. Social Cognition and Connectionism 72 Table 10 Dual Processes in Attitude Formation (Simulation 9) Causal Factors ——————————————— product source Outcomes ———————————————————————-—— featuresa value #20 Low credibility #20 High credibility Prior Knowledge on Source 1 1 1 1 0 0 0 0 0 0 0 0 .1 1 #2 R #2 #2 1 1 1 Strong arguments 1 1 1 1 0 0 0 0 1 0 0 0 0 1 0 1 1 0 #2 R #2 #2 1 1 1 Weak arguments 1 1 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 .1 #2 R #2 #2 #2 1 1 1 1 Ambiguous arguments 1 1 1 1 1 0 0 0 0 0 0 1 0 1 0 .1 0 0 0 0 ? 0 1 0 0 0 0 1 0 Test attitude toward product 1 0 Note. Simplified representation of attitude formation on the basis of heuristics (source credibility) and systematic processing (argument quality; Chaiken & Maheswaran, 1994); Cell entries denote external activation; R=Randomized order; #=number of trials (for heuristic processing, the number of trials for all arguments was set to 1 with all activation levels divided by 10). a The first two features of the product are of high importance, the last two of low importance (as can be seen from the value component). Social Cognition and Connectionism 73 Table 11 Cognitive Dissonance following Induced Compliance (Simulation 10) Causal factors ————————————————————— essay topic payment forced Outcomes ——————————————— write essay value Background Knowledge #10 topic (T) #10 T + payment #10 T + 20% P R #10 T + forced (F) #10 T + P + F #10 T + 20% P + F +̃ +̃ +̃ +̃ +̃ +̃ 0 +̃ +̃a 0 +̃ +̃a 0 0 0 +̃ +̃ +̃ 0 +̃ +̃ +̃ +̃ +̃ 0 0 0 –̃̃ 0 –̃ 0 0 + + + + + + 0 0 – 0 0 ? ? Induced Compliance (each condition only once) #1 choice & low payment #1 choice & high payment #1 forced & low payment #1 forced & high payment + + + + +a + +a + Test attitude toward essay + 0 Note. Schematic representation of induced compliance experiment by Linder, Cooper & Jones (1967); Each factor/outcome is represented by 5 nodes; Cell entries denote external activation with + and – reflecting a randomly drawn Normal distributed pattern with M=+1/-1 & SD=.2 (identical across all trials; the simulation was run for 5 such activation patterns and results were averaged); R=Randomized order; #=number of trials. a M=+.2 to reflect low payment; ~ Noise added randomly at each trial with Normal distribution of M=0 & SD=.2 Social Cognition and Connectionism 74 Table 12 Fit and Robustness of the Simulations, including Alternative Encoding and Models Distributed Feedforward Non-linear Recurrent .97 (.01) .98 (.32b) .98 .97 .99 .94 .97 .96 .94 (.40c) .99 (.89c) .90 — .86 .78 — .99 4 .97 (.28) .91 .97 .96 5 persone groupe .99 (.01) — — .99 .99 (.01) — — .99 6e — .73 .53 7 .86 (.05) .99 (.10d) .99 .99 .99 8 .99 (.20) .98 9 .94 (.30) .94 .99 .82x .97 .84x 10e .99 (.02) — .95 .66x Nr 1 2 3 contn final Original Simulationa Note. Cell entries are correlations between mean simulated values (averaged across randomizations) and empirical data or theoretical predictions. For the distributed encoding, each concept was represented by 5 nodes and an activation pattern drawn from a Normal distribution with M = activation of the original simulation & SD = .20 (5 such random pattern were run and averaged) and additional noise at each trial drawn from a Normal distribution with M = 0 & SD = .20. For the Non-linear auto-associative model, the parameters were: E = I = Decay = .15 and internal cycles = 9 (McClelland & Rumelhart, 1988). For all alternative models, we searched for the best fitting learning parameter. a Learning rate between parentheses; b-d The contextual node's learning rate was (b) 25%, (c) 33% or (d) 166% of this learning rate; e Distributed encoding; x Predicted pattern was not reproduced. Social Cognition and Connectionism 75 Figure Captions Figure 1. (A) Architecture of an auto-associative recurrent network, applied for (B) structural relations and (C) causal relations. Figure 2. Graphical illustration of the principles of (A) acquisition [with learning rate 0.20], (B) competition and (C) diffusion. O=outcome, C=consistent information, I= inconsistent information, T=trait. Filled nodes are activated at a single trial, empty nodes are not activated. Full lines denote strong connection weights, broken lines denote moderate weights while dotted lines denote weak weights. Figure 3. Categorization and prototype abstraction: Network architecture with 4 feature nodes connected to 2 category nodes (only the important connections are shown). Connection weights are shown after the learning history listed in Table 2, where stronger weights are depicted by solid lines and weaker weights by broken lines. Figure 4. Categorization and prototype abstraction: Observed data from Gluck and Bower (1988) and simulation results of categorization (top panel) and prototype abstraction (bottom panel; learning rate = .01). Note that the simulation results from the top panel were regressed onto the observed data with an intercept fixed at .50. Figure 5. Impression formation: Observed data from Stewart (1965) and simulation results (learning rate for person = .32, for context = .08). Figure 6. Impression formation: Observed serial position curves from Dreben, Fiske & Hastie (1979; left panels) and simulation (right panels) of attenuation of recency given continuous responding (top; learning rate for person = .40, for context = .13) and primacy given final responding (bottom; learning rate for person = .89, for context = .29). Figure 7. Higher recall of inconsistent behavioral information after impression formation and memory instructions: Observed data from Hamilton, Katz and Leirer (1980, exp. 3) and simulation results (learning rate = .28). Figure 8. Generalization: Simulation of exemplar and group stereotype assimilation (learning rate = .01). The original external activation of the 5 nodes (reflecting micro-features) is given by solid lines, while the reconstructed activation through internal input is given by broken lines. Figure 9. Assimilation and contrast effects after priming with a trait or person: Observed data from Social Cognition and Connectionism 76 Stapel, Koomen and Van der Pligt (1997, exp. 3) and simulation results (learning rate = .05). Figure 10. Causal attribution and discounting: Observed data from Van Overwalle and Van Rooy (2000) and simulation results (learning rate = .10). Figure 11. Attitude formation: Prediction from the theory of planned behavior by Ajzen (1991) and simulation results (learning rate = .20). Figure 12. Dual processes of attitude formation: Observed data from Chaiken and Maheswaran (1994; top panel) and simulation results (learning rate = .30; bottom panel). Figure 13. Cognitive dissonance: Observed data from Linder, Cooper and Jones (1967) and simulation results (learning rate = .02). Figure 1 A. Recurrent Architecture Output Internal input External input B. Structural Connections Category feature exemplar C. Causal Connections Outcome cause attitude-object Figure 2 A. Acquisition feature A 1 0 0 1 2 3 4 5 6 7 8 9 Trials B. Competition A B F F D F F F F I D C. Diffusion B BB AT 1 F F F F F F I I I Figure 3 Flemish Walloon (Rare) (Common) .21 .05 Dutch lessrefined sophisticated French Figure 4 Judged Probability of Flemish Base-rate N eg lect 0.7 Gluck & Bower Probabilistic Simulation 0.6 0.5 0.4 0.3 0.2 0.1 0.0 Dutch lesssophisticated refined French Feature Simulated Prototypes 1.0 0.8 Flemish Walloon 0.6 0.4 0.2 0.0 -0.2 -0.4 Dutch lesssophisticated Feature refined French Figure 5 Impression Formation 8 High-Low Low-High Simulation Mean Impression 7 6 5 4 3 2 1 2 3 4 5 Successive Adjectives 6 7 8 Figure 6 D ata Simulation Impact of position on 1st rating 2nd rating 3rd rating 4th rating 0.80 0.70 Weight 0.60 0.50 0.40 0.30 0.20 1 2 3 4 Serial Position - 1 2 3 4 Continuous Responding Impact of position on final rating 0.45 Weight 0.40 0.35 0.30 0.25 1 2 3 4 1 2 Serial PositionFinal - Responding 3 4 Figure 7 R ecall of Inconsistent Information 0.8 Consistent Inconsistent Simulation % Free Recall 0.7 0.6 0.5 0.4 0.3 Impression Memory Condition ) and Internal ( ) Activation Figure 8 Generalization of Persons and Groups 1.4 1.0 Group 0.6 0.2 -0.2 Person Original ( -0.6 -1.0 1 2 3 4 Distributed Nodes of Unobserved Feature 5 Figure 9 Priming Trait versus Person Exemplars Impression of Ambigious Person 5.0 Positive Prime Negative Prime Simulation 4.5 4.0 3.5 3.0 2.5 2.0 Trait Person Prime Figure 10 D iscounting and Sample Size 80 75 Small Size Large Size Simulation Causal Rating 70 65 60 55 50 45 40 Known Novel Cause Figure 11 Theory of Reasoned Action 0.2 Prediction Simulation Attitude 0.1 0.0 -0.1 -0.2 -0.3 Car Bicycle Attitude-object Bus Figure 12 D ual Processes in Attitude Formation 3.0 Strong Observed Data 2.5 2.0 1.5 Ambiguous 1.0 0.5 0.0 Credibility: Weak Low High Low Importance Low High Importance Strong 50 Internal Activation High 40 30 Ambiguous 20 10 Weak 0 Credibility: Low High Low Importance Low High Importance High Figure 13 Induced Compliance Attitude toward Essay Topic 4.0 Low Payment High Payment Simulation 3.5 3.0 2.5 2.0 Yes No Choice Figure 14