Working Memory, Delayed Feedback and Category Learning 1 Working Memory Mediates the Effects of Delayed Feedback on Decision Criterion Learning in Perceptual Categorization W. Todd Maddox1 A. David Ing University of Texas, Austin Shawn Ell University of California, Berkeley May 6, 2005 Four experiments tested the hypothesis that the working memory demand associated with accurately holding a stimulus/criterion representation in memory mediates the effects of delayed feedback on rule-based category learning. The interplay among the number of relevant criteria, the number of relevant stimulus dimensions and their relation to working memory demand was examined. Delayed feedback had no effect on rule-based category learning when two decision criteria along a single dimension had to be learned but did have an effect when three decision criteria along a single dimension had to be learned. Model-based analyses suggested that this deficit was due to sub-optimal decision criterion learning and more variable application of the learned decision criteria. A third experiment was conducted to determine whether the number of decision criteria or the working memory load associated with learning the task led to the delayed feedback deficit. The results suggested that three decision criteria can be learned under delayed feedback conditions if two are associated with one stimulus dimension and the third with a second dimension. A direct test of the hypothesis that learning three uni-dimensional decision criteria requires more working memory was offered and supported by the results from Experiment 4. These findings offer important insights into the qualitative properties of the explicit, hypothesis-testing system. Quick and accurate categorization is fundamental to survival. If we incorrectly classify a poisonous food as healthful or an insurgent as a civilian we could die. Categorization is a skill that we perform effortlessly numerous times daily and better than even the most sophisticated machines. Although our ability to learn categories is truly remarkable more work is needed for a detailed understanding of the different category learning systems. In addition, each systems neurobiological underpinning has only recently become the focus of research. An understanding of the cognitive neuroscience of category learning is critical for a complete understanding of human cognition. Category learning involves laying down a memory trace that improves the efficiency (i.e., accuracy and speed) of responding. It is now widely accepted that mammals have multiple 1 This research was supported in part by National Institute of Health Grant R01 MH59196, and by the Center for Perceptual Systems at the University of Texas, Austin. We thank Kelli Hejl, Scott Lauritzen and Mina Wilcox-Ghanoonparvar for help with data collection. Correspondence should be addressed to W. Todd Maddox, Department of Psychology, 1 University Station A8000, University of Texas, Austin, Texas, 78712 (e-mail: maddox@psy.utexas.edu). Working Memory, Delayed Feedback and Category Learning 2 memory systems (Poldrack & Packard, 2003; Schacter, 1987; Squire, 1992), and there is a growing consensus that multiple category learning systems exist. Starting in the 1980’s a growing body of research suggested that participants have available multiple processing modes that can be used during categorization. Well established in the literature is a distinction between categorization according to a rule versus categorization based on the overall similarity (Allen & Brooks, 1991; Erickson & Kruschke, 1998; Folstein & Van Petten, 2004; Kemler-Nelson, 1984; Nosofsky, Palmeri, & McKinley, 1994; Rehegr & Brooks, 1993; Smith & Shapiro, 1989). Building upon this work is a large body of empirical data to support the claim that different category learning systems exist and are mediated by unique (although often overlapping) neural systems. Empirical support comes from a wide range of research areas including animal learning (McDonald & White, 1993, 1994; Packard & McGaugh, 1992), neuropsychology (Filoteo, Maddox, & Davis, 2001a, 2001b; Maddox & Filoteo, 2001, in press; Myers, Shohamy, Gluck, Grossman, Onlaor, & Kapur, 2003), functional neuroimaging (Filoteo, et al, in press; Poldrack, Prabhakaran, et al., 1999, Reber, Stark, & Squire, 1998; Seger & Cincotta, 2002, 2005; Smith, Patalano, & Jonides, 1998), and cognitive psychology (for reviews, see Keri, 2003; and Maddox & Ashby, 2004). One of the most successful multiple systems models of category learning, and the only one that specifies the underlying neurobiology, is the COmpetition between Verbal and Implicit Systems model (COVIS; Ashby, Alfonso-Reese, Turken, & Waldron, 1998; Ashby & Waldron, 1999). COVIS postulates two systems that compete throughout learning – an explicit, hypothesistesting system and an implicit, procedural-based learning system. COVIS assumes that, in humans, the two systems mediate the learning of different types of category structures. Briefly, COVIS assumes that rule-based category learning is dominated by the explicit hypothesis-testing system that uses working memory and executive attention and is mediated by a circuit that includes the anterior cingulate, the prefrontal cortex, and the head of the caudate nucleus. Frequently, the rule that maximizes accuracy (i.e., the optimal rule) is easy to describe verbally. For example, Figure 1a presents a scatterplot of stimuli from a rule-based condition with three categories. Each point in the plot denotes the spatial frequency and spatial orientation of a Gabor patch stimulus (see Figure 1c for an example) with different symbols denoting different categories. In this example, the rule is to give one response to low spatial frequency items, another response to intermediate spatial frequency items, and a third response to high spatial frequency items. To solve this task, the explicit hypothesis-testing system learns which dimension is relevant to solving the task, and learns the location of the associated decision criteria. In contrast, information-integration category learning is dominated by the implicit procedural-based learning system that depends on a reward signal to strengthen the appropriate (stimulus-category) associations in a relatively automatic fashion (Ashby et al., 1998; Ashby & Ell, 2001). This system is mediated largely within the tail of the caudate nucleus (with visual stimuli) and learning relies heavily on a dopamine mediated reward signal2. To date, nearly a dozen experiments have been conducted that dissociate processing in these two systems by introducing experimental manipulations that, based on the proposed underlying neurobiology, should adversely affect processing in one system, but not the other and 2 In primates, all of extrastriate visual cortex projects directly to the tail of the caudate nucleus, with about 10,000 visual cortical synapses converging onto each medium spiny cell in the caudate (Wilson, 1995). These medium spiny cells then project to prefrontal and premotor cortex (via the globus pallidus and thalamus; e.g., Alexander, DeLong, & Strick, 1986). The idea is that an unexpected reward causes substantia nigra neurons to release dopamine from their terminals in the caudate nucleus (Hollerman & Schultz, 1997; Schultz, 1992), and that the presence of this dopamine strengthens recently active synapses (Arbuthnott, Ingham, & Wickens, 2000; Kerr, & Wickens, 2001). Working Memory, Delayed Feedback and Category Learning 3 vice versa. A detailed review of this work is beyond the scope of this article but can be found in Maddox & Ashby (2004; see also Ashby & Spiering, 2004). One of these dissociations involved manipulating the time between the participant’s response and the feedback. Because of the proposed underlying neurobiology, COVIS predicts different effects of delayed feedback on rule-based and information-integration learning. Since verbalizable rules are learned and stored in working memory by the hypothesis-testing system, rule-based learning should be unaffected by delayed feedback. On the other hand, since stimulusto-category associations are learned by the procedural-based learning system that requires a close temporal correspondence between the perceptual signal, the response and the feedback, delayed feedback should affect information-integration category learning. Maddox, Ashby and Bohil (2003; see also Maddox & Ing, 2005) examined the effects of delayed feedback on two-category uni-dimensional rule-based category learning and two-category information-integration category learning. On each trial in the immediate feedback condition a stimulus was presented, the participant generated a response, and feedback was presented 500ms following the response. On each trial in the delayed feedback condition a stimulus was presented, the participant generated a response, and feedback was presented 5-sec following the response. Delayed feedback adversely affected information-integration, but not rule-based category learning. The goals of the current study are many. At the most general level, we wish to examine the qualitative properties of the two systems with an emphasis on testing the limits of the hypothesis-testing system. We test the hypothesis that the working memory demand associated with solving a rule-based task mediates the effect of delayed feedback. Specifically, as the working memory demand associated with solving the task increases, the adverse effect of delayed feedback on rule-based category learning should also increase. In the process we provide insight into the relations among working memory demand, the number of decision criteria to be learned, and the number of stimulus dimensions relevant to solving the task. We show that although the working memory demand is often correlated with the number of decision criteria to be learned, this depends crucially on the dimensionality of the optimal decision rule. To achieve these aims we conducted four experiments. Experiments 1 and 2 examined the effects of delayed feedback on uni-dimensional rule-based and information-integration category learning when three or four categories, respectively, had to be learned. Since the working memory demand increases with each additional uni-dimensional decision criteria our working memory hypothesis predicts that the effect of delayed feedback should be larger in the four-category uni-dimensional task than in the three-category uni-dimensional task. This prediction was confirmed. Experiments 1 and 2 confound the number of decision criteria to be learned with the working memory demand associated with the task making it impossible to determine which factor accounts for the change in performance. Experiment 3 dissociates these two factors by examining rule-based category learning when three decision criteria must be learned, but two are associated with one stimulus dimension and the third is associated with a second stimulus dimension (i.e., a conjunctive as opposed to a uni-dimensional rule). Work in absolute identification, a paradigm in which participants are asked to assign a unique response to each unique stimulus, shows that identification performance improves as the number of dimensions relevant to solving the task increases presumably because the working memory demand is reduced in the multi-dimensional case (Attneave, 1959; Garner, 1962; Miller, 1956; Pollack, 1952). For example, participants have great difficulty learning to identify nine lines of different lengths, but have little difficulty learning nine lines constructed from the factorial combination of three line lengths with three line orientations. If these results generalize to category learning, then the prediction is that delayed feedback should have less of an effect with two decision criteria along one dimension and a third along the second dimension (Experiment 3), than with three decision criteria along a single dimension (Experiment 2). This prediction was supported by the data. Although the results from Experiment 3 support our working memory hypothesis, the test is rather indirect. Experiment 4 tests the working memory hypothesis more Working Memory, Delayed Feedback and Category Learning 4 directly using a task originally proposed by Maddox, Ashby, Ing and Pickering (2004). On each trial in this task the participant categorizes, receives 500ms of feedback and then completes a single four-item memory scanning trial. Maddox et al. (2004) showed that memory scanning disrupted feedback processing in the explicit hypothesis-testing system by demanding working memory capacity. Participants learned the Experiment 2 or Experiment 3 categories using the Maddox et al. procedure to determine which structure placed greater demand on working memory. Experiment 1 Maddox et al. (2003) examined the effects of delayed feedback on two-category, unidimensional, rule-based category learning (in which a single decision criterion was required) and two-category information-integration category learning. They found no effect of delay on unidimensional rule-based category learning but a large effect on information-integration category learning. Experiment 1 extends Maddox et al. (2003) by including a third category and thus a second uni-dimensional decision criterion in the rule-based condition. This should increase the working memory load associated with solving the uni-dimensional rule-based task, and thus based on our working memory hypothesis might lead to a delayed feedback effect. The logic is as follows. Suppose that the representation of the stimulus and the criteria are noisy and modeled by a zero mean diffusion process (e.g., Ratcliff, 1978). This would imply that as feedback is delayed, the criterion/stimulus representation will naturally drift away from its mean position, and this would slow the learning rate. Working memory and attention processes can be invoked to slow the drifting rate, but because working memory has a limited capacity, the system’s efficiency will decrease as the number of decision criteria to be represented increases. The rule-based and information-integration category structures used in Experiment 1 are depicted in Figure 1 along with the decision bounds that maximize accuracy. The stimulus on each trial was a Gabor patch with a spatial frequency and orientation that varied across trials. The distribution parameters are outlined in Table 1. Each symbol in Figure 1 denotes the spatial frequency and orientation of a single Gabor patch. In the rule-based task, hereafter referred to as the 3UD task because there are three categories and the decision rule is uni-dimensional, the optimal decision rule requires that two criteria be set along the spatial frequency dimension and that the following rule be used: “Respond A if the frequency is low; respond B if the frequency is intermediate; respond C if the frequency is high”. The stimuli in the three-category informationintegration task (hereafter referred to as 3II) were constructed by rotating the 3UD distributions clockwise 45 degrees around the center of the frequency-orientation space, and then increasing pair-wise discriminability in order to equate performance in the immediate feedback condition. This is why the stimuli in the 3II condition are farther apart than the stimuli in the 3UD condition (compare Figures 1a and 1b). It is important to equate performance across category structures in the immediate feedback condition so that alternative explanations of any delayed feedback effect based on difficulty can be ruled out. (A series of small sample pilot studies were conducted to determine the 3II category distributions that met this criterion.) Unlike the 3UD condition, the optimal rule in the 3II condition has no simple verbal description. INSERT FIGURE 1 AND TABLE 1 ABOUT HERE Method Participants and Design. One-hundred participants were solicited from the University of Texas community and received course credit for participation. Twenty-four and 26 participated in the 3UD delay and immediate feedback conditions, respectively, whereas 26 and 24 participated in the 3II delay and immediate feedback conditions, respectively. No participant completed more than one experimental condition, and each session lasted approximately 60 minutes. All participants were tested for 20/20 vision using a Snelling eye chart. In nearly all of our current work with two categories we define a “learner” as a participant who achieves 65% accuracy during the final block of trials. Because Experiment 1 included three categories we lowered the criterion proportionally to 43% accuracy during the final block of trials. The data from Working Memory, Delayed Feedback and Category Learning 5 participants who did not meet this criterion were excluded from all subsequent analyses. This criterion excluded 6, and 4 from the 3UD delay and immediate conditions, respectively, and 5 and 1 from the 3II delay and immediate conditions, respectively3. Stimuli and Stimulus Generation. The experiment used the randomization technique introduced by Ashby and Gott (1988). Eighty-one stimuli (27 from each of the three categories) from the 3UD categories were generated by sampling randomly from three bivariate normal distributions. The stimuli for the 3II categories were generated by rotating the 81 3UD stimuli clockwise by 45° around the center of the frequency-orientation space, and then shifting the stimuli away from the center of the space. Each set of 81 stimuli was displayed in a random order in each of four blocks of trials. The stimuli were computer generated and displayed on a 21” monitor with 1360 X 1024 resolution in a dimly lit room. Each Gabor patch was generated using Matlab routines from Brainard’s (1997) Psychophysics Toolbox. Each random sample (x1, x2) was converted to a stimulus by deriving the frequency, f = .25 + (x1/50), and orientation, o = x2(/500). For example, the category A mean for the 3UD category structure was converted to a Gabor pattern with frequency, f = .25 + (255/50) = 5.35 cycles/degree and orientation, o = 125(/500) = 0.785 radians counterclockwise from horizontal. The scaling factors were chosen in an attempt to equate the salience of frequency and orientation. Procedure. The participants were informed that there were three categories and that each category was equally likely. They were informed that perfect performance was possible and were instructed to learn about the three categories. They were told to be as accurate as possible and not to worry about speed of responding. The procedure for a typical trial was as follows: Immediate feedback condition: Response terminated stimulus display – 500ms Mask – 750ms feedback – 5-sec blank screen ITI Delayed feedback condition: Response terminated stimulus display – 5-sec Mask – 750ms feedback – 500ms blank screen ITI The mask was a Gabor pattern that subtended approximately 11 degrees of visual angle and was of a random frequency and orientation from within the range of stimulus values. Results and Theoretical Analysis Analyses were performed separately on each block of data. In the first section (entitled ANOVA Results) we analyze the accuracy rates using ANOVA. In the second section (entitled Model Results) we introduce the model-based analyses. ANOVA Results As discussed above, it is important to equate 3UD and 3II category learning under control (immediate feedback) conditions to rule out task difficulty as an explanation for potential effects of delayed feedback4. The accuracy rates for the immediate feedback conditions averaged across 3 Two comments are in order regarding this learning criterion. First, after data collection we examined other learning criterion values including one that excluded no data. The qualitative pattern of results was relatively unchanged across conditions. Second, to determine whether the “non-learners” displayed essentially random responding, or whether they displayed systematic responding, we fit the decision bound models described in the results section to the data and also models that assumed a fixed response probability for all stimuli (indicative of random or biased random responding). In all but a few cases, the models indicative of random or biased random responding provided the best account of the non-learners data. A similar result held in Experiments 2 and 3 described in this article. Maddox et al., (2003) included conditions in which they equated on d’ and conditions in which they equated on accuracy and the results were unchanged. Even so, our position is that equating on accuracy is advantageous. 4 Working Memory, Delayed Feedback and Category Learning 6 participants are displayed in Figure 2a. In every block, 3UD performance was worse than 3II performance, but this difference was only significant in the first block [t(43) = 2.93, p < .01]. It is important to note that a “difficulty” explanation of any delayed feedback effect that we might observe would have to predict that delay should adversely affect 3UD performance more than 3II performance since 3UD immediate feedback performance is slightly worse. To determine whether delayed feedback affected three-category rule-based and information-integration category learning a 2 category structure (3UD vs. 3II) x 2 feedback (immediate vs. delay) x 4 block mixed design ANOVA was conducted on the accuracy rates. The accuracy rates averaged across participants for the immediate and delayed feedback 3UD conditions are displayed in Figure 2b, and for the 3II condition are displayed in Figure 2c. The main effect of block was significant [F(3, 240) = 115.79, p < .001, MSE = .008], whereas the main effect of category structure [F(1, 80) = .14, ns] and feedback condition were not [F(1, 80) = 1.72, ns]. Block interacted with category structure [F(3, 240) = 4.97, p < .01, MSE = .008], but not with any other factor [all F’s < 1]. Most importantly, there was an interaction between category structure and feedback condition [F(1, 80) = 3.87, p =.05, MSE = .069]. Post hoc analyses revealed a large decline in 3II performance under delayed feedback (59%) relative to immediate feedback (68%) conditions that was significant [t(42) = 2.25, p < .05], but no effect on 3UD performance of delayed feedback (63%) relative to immediate feedback (61%)(t<1). INSERT FIGURE 2 ABOUT HERE These results suggest that increasing the number of uni-dimensional rule-based decision criteria to two does not increase the working memory load enough to lead to a rule-based learning deficit under delayed feedback conditions. We turn now to the model-based analyses that shed some light on the locus of this performance deficit. Modeling Results Following Maddox et al. (2003), we applied a series of decision bound models to the data to determine the types of strategies that participants might use to solve these tasks. Maddox et al. found that many participants in the information-integration condition resorted to hypothesistesting strategies when feedback was delayed because learning in the tail of the caudate nucleus was impaired. This reasoning has corollaries with a number of different decision bound models (Ashby, 1992a; Maddox & Ashby, 1993) that were used to test the hypothesis. Decision bound models were fit separately to the data from each participant and each block. When informative we provide information about all blocks, but for brevity we focus on the results from the final block of data. Decision bound models are derived from General Recognition Theory (GRT; Ashby & Townsend, 1986), which is a multivariate generalization of signal detection theory (e.g., Green & Swets, 1966). GRT assumes that there is trial-by-trial variability in the perceptual information obtained from every stimulus, no matter what the viewing conditions (Ashby & Lee, 1993). GRT assumes that each participant partitions the perceptual space into response regions by constructing decision boundaries to separate the regions. On each trial, the participant determines which region the percept is in, and then emits the associated response. Despite this deterministic decision rule, decision bound models predict probabilistic responding because of trial-by-trial variability that occurs as a result of perceptual and criterial noise. Two different classes of decision bound models were fit to the data (see Ashby, 1992a; Maddox & Ashby, 1993, for a more formal treatment of these models). One type is compatible with the assumption that participants used an explicit hypothesis-testing strategy and one type assumes an information-integration strategy. The following models were fit to each information-integration participant’s responses since the effects of delayed feedback were restricted to this category structure. Hypothesis-Testing Models. Two uni-dimensional rule-based models were applied. The uni-dimensional, spatial frequency model assumes that the participant sets two criteria along the spatial frequency dimension and uses the rule: Respond A if the spatial frequency is low; Respond Working Memory, Delayed Feedback and Category Learning 7 B if the spatial frequency is intermediate; Respond C if the spatial frequency is high. The unidimensional, spatial orientation model assumes that the participant sets two criteria along the spatial orientation dimension and uses the rule: Respond A if the spatial orientation is high; Respond B if the spatial orientation is intermediate; Respond C if the spatial orientation is low. Both of these models have three free parameters: two decision criteria, and one noise variance. Information-Integration Models. The optimal model assumes that the participant uses the optimal linear decision bounds. This model has only one free parameter (noise variance). The minimum distance classifier assumes that there are three “units”, one associated with each category, in the frequency-orientation space. On each trial the participant determines which unit is closest to the perceptual effect and gives the associated response. Because the location of one of the units can be fixed and since a uniform expansion or contraction of the space will not affect the location of the resulting (minimum distance) decision bounds, the model contains four free parameters (i.e., 3 that determine the location of the units, and one noise variance). This model has been found to provide a good computational model of participants response regions in previous information-integration category learning studies (e.g., Ashby & Waldron, 1999; Maddox et al., 2003, 2004; Waldron & Ashby, 2001; for applications to stimulus identification see Ashby, Waldron, Lee, & Berkman, 2001; Maddox, 2001, 2002). In addition, the assumptions of this model have strong neurobiological plausibility. Model Fits. The model parameters were estimated using maximum likelihood (Ashby, 1992b; Wickens, 1982) and the goodness-of-fit statistic was AIC = 2r - 2lnL, where r is the number of free parameters and L is the likelihood of the model given the data (Akaike, 1974; Takane & Shibayama, 1992). The AIC statistic penalizes a model for extra free parameters in such a way that the smaller the AIC, the closer a model is to the “true model,” regardless of the number of free parameters. Thus, to find the best model among a given set of competitors, one simply computes an AIC value for each model, and chooses the model associated with the smallest AIC value. Using AIC, we determined which model type, hypothesis-testing or informationintegration provided the best account of the data. The proportion of data sets best fit by each model type is displayed in the stacked bar chart in Figure 3 separately for the delayed and immediate feedback conditions. In addition, the percent correct for the participants classified as using a hypothesis-testing or information-integration strategy is displayed. Before discussing the Figure 3 data it is important to determine how well the models actually accounted for the data by computing the percent of responses accounted for by the best fitting model. In the delayed and immediate feedback conditions the hypothesis-testing models accounted (on average) for 65% (standard deviation = 14) and 65% (standard deviation = 18) of the responses whereas the information-integration models accounted (on average) for 75% (standard deviation = 10) and 85% (standard deviation = 13) of the responses suggesting that these model comparisons are valid. Recall that information-integration category learning is assumed to be mediated by a procedural-based learning system that is mediated largely within the tail of the caudate nucleus (with visual stimuli). Learning in this system relied heavily on a dopamine mediated reward signal and a close temporal correspondence between stimulus presentation, response and feedback so that the dopamine release due to the presentation of the feedback can strengthen recently active synapses (Arbuthnott, Ingham, & Wickens, 2000; Hollerman & Schultz, 1997; Kerr, & Wickens, 2001; Schultz, 1992). Delayed feedback disrupts this temporal correspondence. During the delay period synapses that were active because of the presentation of the stimulus are returning to baseline and thus dopamine release will not necessarily strengthen the appropriate synapses. Under these conditions, the procedural-based learning system will be operating at a sub-optimal level and many participants may abandon this system in favor of the (less efficient) but more reliable (under delayed feedback conditions) hypothesis-testing system. Maddox et al. Working Memory, Delayed Feedback and Category Learning 8 (2003; Maddox & Ing, 2005) found support for this prediction in their model-based analyses. Specifically, they found a large increase in the proportion of information-integration participant’s data sets best fit by hypothesis-testing models in the delayed feedback relative to the immediate feedback condition. The same pattern was found in the current data with participants being more likely to use a hypothesis-testing strategy to solve the information-integration task under delayed feedback conditions (.52) than under immediate feedback conditions (.22). In addition, accuracy rates for information-integration participants were worse under delayed (70%) than immediate feedback (82%) conditions. Interestingly, accuracy rates for hypothesis-testing participants were somewhat higher under delayed feedback conditions (62%) than under immediate feedback conditions (52%). Taken together these results suggest that the accuracy deficit observed in the delayed feedback condition resulted from an increase in the use of hypothesis-testing strategies and a decrease in the accuracy rate achieved by those participants who used an information-integration strategy. To gain further insight into the effects of delayed feedback on 3II category learning we examined the parameter values for the various models. Several comments are in order. First, in the delayed feedback condition, six of the 11 hypothesis-testing participants focused exclusively on the spatial frequency dimension whereas the remaining 5 focused on the spatial orientation dimension. In the immediate feedback condition, 4 of the 5 hypothesis-testing participants focused on frequency and the remaining participant focused on orientation. Second, the hypothesis-testing participants in the delayed feedback condition were biased to respond B, at the expense of accurate responding to A and C stimuli. In other words, these participants seemed to be most interested in responding correctly to the items of an intermediate frequency (or intermediate orientation, depending upon which model fit best). The best fitting decision criterion values reflected this bias with the lower decision criterion being placed too low and the upper decision criterion being placed too high relative to the criteria associated with the optimal unidimensional model for these stimuli. Third, the poorer performance for the informationintegration participants in the delayed feedback condition appeared to be due to the fact that fewer of these participants’ data was best fit by the optimal model (5 of 11 in the delayed feedback condition relative to 11 of 18 in the immediate feedback condition). Although no effect of delayed was observed in the 3UD condition, we applied the same models (except the uni-dimensional, spatial frequency model) to the final block of trials from the participants who learned the rule-based category structures. Again we computed the percent of responses accounted for and found relatively large values suggesting that the model comparisons are valid [delay: hypothesis-testing = 73% (standard deviation = 15), information-integration = 72% (standard deviation = 17); immediate: hypothesis-testing = 72% (standard deviation = 15), information-integration = 82% (standard deviation = 4)]. As predicted, the use of hypothesistesting strategies was high for both immediate (.82) and delayed feedback (.83) conditions, and the accuracy rate achieved by these participants was also high (73% and 75% for the immediate and delayed feedback conditions, respectively). INSERT FIGURE 3 ABOUT HERE Experiment 2 In Experiment 1, participants were able to learn a rule-based task that required placement of two uni-dimensional decision criteria equally well under delayed and immediate feedback conditions. To further tax the representational system and the associated working memory and attention load, Experiment 2 included a third decision criterion and hence a fourth category. The category structures used in Experiment 2 and are depicted in Figure 4 along with the decision bounds that maximize accuracy. The distribution parameters are outlined in Table 2. Categories A, B, and C for both the rule-based and information-integration conditions are identical to those from Experiment 1. In the four-category, uni-dimensional, rule-based task (hereafter referred to Working Memory, Delayed Feedback and Category Learning 9 as 4UD), the optimal bounds require the participant to set three criteria on spatial frequency using the following rule: “Respond A if the frequency is very low; respond B if the frequency is low; respond C if the frequency is high; respond D if the frequency is very high”. As in Experiment 1, the optimal rule in the 4II condition has no simple verbal description, and in both conditions the stimuli were highly discriminable. INSERT FIGURE 4 AND TABLE 2 ABOUT HERE Method Participants and Design. One-hundred-two participants were solicited from the University of Texas community and received course credit for participation. Twenty-four and 31 participated in the 4UD delay and immediate feedback conditions, respectively, whereas 25 and 22 participated in the 4II delay and immediate feedback conditions, respectively. No participant completed more than one experimental condition, and each session lasted approximately 60 minutes. All participants were tested for 20/20 vision using a Snelling eye chart. Prior to data collection, we defined the criterion for learning as 32.5% correct during the final block of trials. The data from participants who did not meet this criterion were excluded from all subsequent analyses. This criterion excluded 5, and 6 from the 4UD delay and immediate conditions, respectively, and 4 and 4 from the 4II delay and immediate conditions, respectively. Stimuli and Stimulus Generation. Stimulus generation and presentation were identical to those in Experiment 1 except that 20 stimuli were sampled from each category distribution. Procedure. The procedure was identical to that from Experiment 1 except that the participants were informed that there were four equally-likely categories. Results and Theoretical Analysis ANOVA Results Again, it was important to determine whether 4UD and 4II immediate feedback performance was equated. The accuracy rates averaged across participants are displayed in Figure 5a. Performance did not differ in any block (all t’s < 1.0). To determine whether delayed feedback affected four-category rule-based and information-integration category learning a 2 category structure (4UD vs. 4II) x 2 feedback (immediate vs. delay) x 4 block mixed design ANOVA was conducted on the accuracy rates. The accuracy rates averaged across participants for the immediate and delayed feedback 4UD conditions are displayed in Figure 5b, and for the 4II condition are displayed in Figure 5c. The main effects of block [F(3, 237) = 116.04, p < .001, MSE = .007] and feedback type [F(1, 79) = 15.30, p < .001, MSE = .064] were significant, whereas the main effect of category structure [F(1, 79) = 2.45, p > .10] was not. Block interacted with category structure [F(3, 237) = 3.07, p < .05, MSE = .007], but not with feedback [F(3, 237) = 2.51, p > .05], however all of these effects were qualified by a significant three-way interaction [F(3, 237) = 3.69, p < .05, MSE = .007]. To determine the locus of the interaction we examined the effects of delayed feedback separately for each category structure and block. In the information-integration condition performance was significantly worse under delayed feedback in all blocks of trials (p < .01 in all blocks). For the rule-based condition, the effect of delayed feedback was non-significant in block 1 [t(42) = 1.45, p > .05], although the difference was still large (delay = 35%; immediate = 42%; difference = 7%), but was significant in block 2 [t(42) = 2.44, p < .05] and block 3 [t(42) = 2.19, p < .05] with delayed feedback leading to worse performance (block 2 = 47%; block 3 = 54%) than immediate feedback (block 2 = 61%; block 3 = 65%). However, by block 4 there was no reliable difference [delay = 64%; immediate = 67%; t < 1]. These results suggest that the need to learn three uni-dimensional decision criteria does affect rule-based category learning at some stages of learning. INSERT FIGURE 5 ABOUT HERE Modeling Results Working Memory, Delayed Feedback and Category Learning 10 Because our focus is on rule-based category learning, we began by generalizing and fitting the hypothesis-testing and information-integration models outlined in Experiment 1 to each rule-based participant’s data on a block by block basis5. In the delay condition, the proportion of observers using a hypothesis-testing strategy was high in all four blocks (.90, .78, .79, and .78 in blocks 1 – 4, respectively). A similar pattern held in the immediate feedback condition (.84, .88, .84, and .80 in blocks 1 – 4, respectively). On the other hand, the accuracy rates for these participants were lower in the delay than in the immediate feedback conditions in blocks 1 – 3 (delay: .34, .47, and .52 in blocks 1 – 3, respectively; immediate: .43, .59, and .64 in blocks 1 – 3, respectively), but were equivalent in block 4 (67% and 67% for the immediate and delayed feedback conditions, respectively). Interestingly, the proportion of participants whose data was best fit by the optimal model was approximately the same in all blocks except the first block. In block 1 more participants in the immediate feedback condition were best fit by the optimal model [proportion best fit by the optimal model: immediate = .36, .36, .48, and .44 in blocks 1 – 4, respectively; delayed = .16, .37, .47, and .47 in blocks 1 – 4, respectively]. To identify the locus of the delayed feedback effect on rule-based learning during the first three blocks of trials, we examined the parameter estimates from the sub-optimal spatial frequency model. We computed the absolute deviation between the observed decision criteria and the optimal criteria for each of the three criteria and averaged those absolute deviations. Large absolute deviations are associated with highly suboptimal decision criterion placements and thus poor learning whereas small absolute deviations are associated with more nearly optimal decision criterion placement and thus good learning. In addition, we examined the criterial noise standard deviation from the same model. A large criterial noise standard deviation is associated with poorer trial-by-trial memory and application of the learned decision criterion whereas a small criterial noise standard deviation is associated with good trial-by-trial memory and application of the learned decision criterion. Focusing only on participants whose data was best fit by a hypothesis-testing model, the median for the decision criterion measure was 35.2, 9.7, 12.9, and 5.1 for the delayed feedback condition in blocks 1 – 4, respectively and was 17.7, 5.7, 5.1, and 4.9 for the immediate feedback condition in blocks 1 – 4, respectively. The median for the noise measure was 10.6, 6.8, 5.7, and 4.5 for the delayed feedback condition in blocks 1 – 4, respectively and was 8.2, 4.7, 4.7, and 4.2 for the immediate feedback condition in blocks 1 – 4, respectively. Thus the delayed feedback effect early in rule-based learning appears to be due primarily to poor learning of the optimal decision criteria and to a slight increase in the noise associated with the trial-by-trial memory and application of the learned criteria. Interestingly, hypothesis-testing participants in the delayed feedback condition also appeared to show a response bias similar to that found in the delayed feedback, information-integration condition from Experiment 1 favoring one or both of the intermediate categories. For completeness, we also fit the models to the data from the information-integration condition. The results were similar across blocks so for brevity we focus only on the final block of trials. The proportion of data sets best fit by each model type and the percent correct for the participants classified as using each type of strategy is displayed in Figure 6. As predicted, and in line with the results from Experiment 1, participants were more likely to use a hypothesis-testing strategy to solve the information-integration task for delayed feedback conditions (.76) than for immediate feedback conditions (.39). In addition, accuracy rates for information-integration participants were worse for delayed (49%) than for immediate feedback (71%) conditions. Finally, accuracy rates for hypothesis-testing participants were worse for delayed (48%) than for immediate feedback (61%) conditions. 5 Percent of responses accounted for were again high ranging from 70-72% for the hypothesis-testing model and 66%-82% for the information-integration models when applied to the immediate and delayed feedback conditions from the 4UD and 4II conditions. Working Memory, Delayed Feedback and Category Learning 11 An examination of the model parameters from the 4II condition yielded similar results to those observed in Experiment 1. First, but unlike the results from Experiment 1, all but one of the hypothesis-testing participants in the delayed and immediate feedback conditions focused exclusively on the spatial frequency dimension. Second, the hypothesis-testing participants in the delayed feedback condition showed one of three patterns that were approximately even distributed across these participants: a bias to respond B, a bias to respond C or a bias toward B and C. In other words, and in line with the interpretation of Experiment 1’s results, these participants seemed to be most interested in responding correctly to items of one or both intermediate frequencies. The best fitting decision criterion values reflected these biases with the lower decision criterion being placed too low and the upper decision criterion being placed too high relative to the criteria associated with the optimal uni-dimensional model for these stimuli. Third, and again in line with Experiment 1, the poorer performance for the informationintegration participants in the delayed feedback condition appeared to be due to the fact that fewer of these participants’ data was best fit by the optimal model (1 of 5 in the delayed feedback condition relative to 4 of 11 in the immediate feedback condition). In line with the results from Experiment 1, the accuracy deficit observed in the delayed feedback condition resulted from an increase in the use of hypothesis-testing strategies and a decrease in the accuracy rate achieved by those participants who used an information-integration strategy. Brief Summary Two findings are of importance from Experiments 1 and 2. First, and as predicted directly from COVIS, delayed feedback adversely affected information-integration category learning. The model based analyses suggested that delayed feedback (a) led many participants to abandon information-integration strategies in favor of hypothesis-testing strategies that were often biased toward one or two of the categories, and (b) resulted in fewer information-integration participants using the optimal decision rule. Second, and most importantly, when the working memory demand associated with learning the uni-dimensional rule-based task was increased (by requiring participants to learn three unique decision criteria) an adverse effect of delayed feedback emerged. This effect is not predicted directly from COVIS, but follows from the assumption that working memory and attention processes are necessary for optimal rule-based category learning. Model-based analyses indicate that the delayed feedback effect on uni-dimensional rule-based category learning in Experiment 2 was due to poor learning of the optimal decision criterion locations and an increase in perceptual/criterial noise. This finding would be expected if the representation of the stimulus and the criteria are noisy and are modeled by a zero mean diffusion process (e.g., Ratcliff, 1978). As feedback is delayed the criterion/stimulus representation will naturally drift away from its mean position and learning could be slowed. Working memory and attention processes can be invoked to slow this drift, but this system will decrease in efficiency as the number of decision criteria to be represented increases, leading to poorer decision criterion learning. Experiment 3 Although the results to this point support the hypothesis that a large increase in working memory demand associated with accurately solving the rule-based task leads to a decrement in rule-based category learning, this increase in working memory demand is perfectly correlated with an increase in the number of decision criteria to be learned and thus a simple alternative hypothesis is that increasing the number of decision criteria leads to a delayed feedback effect can not be ruled out. Experiment 3 was conducted to tease apart these two hypotheses. In Experiment 3, participants were required to learn three decision criteria with two falling on the spatial frequency dimension and one falling on the spatial orientation dimension. [Analogous information-integration conditions were not included since we have already shown consistent delayed feedback effects across Experiments 1 and 2.] As stated earlier, the absolute identification literature suggests that identification performance improves as the number of Working Memory, Delayed Feedback and Category Learning 12 dimensions relevant to solving the task increases presumably because the working memory demand is reduced in the multi-dimensional case (Attneave, 1959; Garner, 1962; Miller, 1956; Pollack, 1952). Because our focus was on a direct comparison with the results from Experiment 2, we used four categories in this experiment. If the absolute identification results generalize to categorization, then delaying feedback should not have an effect in Experiment 3 (or the effect should be much smaller), even though three decision criteria are relevant, as in Experiment 2. On the other hand, if the absolute number of criteria to be learned mediates the delayed feedback effect, then delaying feedback should slow or disrupt learning in this experiment. The category structures used in Experiment 3 are depicted in Figure 7a along with the decision bounds that maximize accuracy. The distribution parameters are outlined in Table 3. We refer to this as the 4CJ condition since four-categories are relevant and the decision rule requires a conjunction of decisions across both dimensions. The optimal bounds require the participant to set two criteria on spatial frequency and one on spatial orientation using the following rule: “Respond A if the frequency is low; respond B if the frequency is intermediate and the orientation is shallow; respond C if the frequency is intermediate and the orientation is steep; respond D if the frequency is high”. Because we want to compare performance directly between the 4UD condition from Experiment 2 and the 4CJ condition from Experiment 3, we equated immediate feedback performance between the conditions. We did this by conducting a series of small pilot studies to determine the appropriate category distribution parameters to achieve this goal. INSERT FIGURE 7 AND TABLE 3 ABOUT HERE Method Participants and Design. Fifty-three participants were solicited from the University of Texas community and received course credit for participation. Twenty-six and 31 participants completed delayed and immediate feedback conditions, respectively. No participant completed more than one experimental condition. All participants were tested for 20/20 vision using a Snelling eye chart. Each participant completed 1 session of approximately 60 minutes duration. A learning criterion of 32.5% correct during the final 80 trial block excluded 3 and 3 participants from the delayed and immediate feedback conditions. Stimuli, Stimulus Generation, and Procedure. The stimuli, stimulus generation and experimental procedures were identical to those from Experiment 2. Results Because our focus was on a direct comparison with the results from Experiment 2, we attempted to equate category learning performance across the 4UD and 4CJ immediate feedback conditions. To determine whether 4UD and 4CJ immediate feedback performance was equated we compared performance on a block by block basis. In all four blocks performance did not differ (all p’s > .05) suggesting that we were successful at equating immediate feedback performance across the two rule-based conditions. To determine whether delayed feedback affects 4CJ category learning a 2 feedback condition (immediate vs. delayed) x 4 block mixed design ANOVA was conducted on the accuracy rates. The accuracy rates averaged across participants for the immediate and delayed feedback conditions are displayed in Figure 7b. Only the main effect of block [F(3, 87) = 61.51, p < .001, MSE = .009] was significant, with both the main effect of feedback condition and the interaction yielding F’s less than 1. As suggested by a visual examination of Figure 7b there was clearly no effect of delay on rule-based learning when the participant was required to learn two decision criteria along the frequency dimension and one along the orientation dimension. Taken together with the results from Experiments 1 and 2 and in line with the uni-dimensional vs. multidimensional absolute identification literature, the absolute number of decision criteria (3 in Experiments 2 and 3) does not determine whether delay affects Working Memory, Delayed Feedback and Category Learning 13 decision criterion learning, but rather the number of decision criteria along a single fixed dimension does6. Experiment 4 Comparing the results from Experiments 2 and 3 provided an important test of the hypothesis that a large increase in the working memory demand associated with solving the rulebased task leads to a deficit in rule-based category learning when feedback is delayed. In addition, comparing the results from these two studies allowed us to rule out the alternative hypothesis that delayed feedback effects are mediated solely by the number of decision criteria to be learned. Even so, this test is somewhat indirect and relies heavily on the assumption that the working memory demand is lower in a task where two decision criteria along one dimension and one decision criterion along the other dimension are relevant as compared with a case in which three decision criteria along a single dimension are relevant. The absolute identification literature supports this claim but a more direct test based on well established findings in category learning would be advantageous. An experimental paradigm proposed by Maddox, Ashby, Ing and Pickering (2004) provides just such a test. Recall that the explicit hypothesis-testing system has full access to working memory and executive attention and learns through an active process of hypothesis generation and testing. For example, following feedback that an error has just occurred, the explicit system must (a) decrease the salience of the current categorization rule, (b) identify a new plausible candidate rule if the salience of the current rule has dropped sufficiently, and (c) switch attention from the old rule to the new rule (Ashby et al. 1998). This sequence of events requires both attention and time. Thus, it should be possible to interfere with rule-based category learning by requiring participants to perform a second unrelated task that requires working memory and attention immediately after the feedback is given. The idea is that performance of this second task will prevent normal feedback processing in the explicit system and will result in lower rule-based category learning performance. Importantly, the degree to which this second working memory demanding task will interfere with rule-based category learning should be directly proportional to the working memory demand associated with learning the rule-based category learning task. In other words, interference from the secondary working memory task on rule-based category learning should be greater for a rule-based task that has a high working memory demand than for a rule-based task that has a lower working memory demand. Because we hypothesize that the 4UD task requires more working memory to learn than the 4CJ task, we predict greater interference from the secondary task on 4UD category learning relative to 4CJ category learning. Maddox et al., (2004) chose four-item memory scanning as their secondary task and we follow their procedure. On each trial the participant must perform two tasks sequentially. First a categorization task is performed using traditional immediate feedback training. Then following completion of the categorization trial, the participant performs a four-item memory-scanning task. The timing of each trial is as follows. The participant is shown a categorization stimulus for 1000ms and generates a response. Once the response is made corrective feedback is provided for 500ms. The feedback display is then replaced immediately with a horizontal display of 4 randomly sampled (without replacement) digits between 0 and 9. This array of digits is displayed 6 Although no performance differences emerged across immediate and delayed feedback conditions, the hypothesis-testing and information-integration models outlined in Experiment 2 were generalized to include an orientation decision criterion and were applied to the final block of data separately for each delayed and immediate feedback participant. The use of hypothesis-testing strategies was high and approximately equal for both immediate and delayed feedback conditions (proportions over .70 in all blocks), and the accuracy rate achieved by these participants was also approximately equivalent across immediate and delayed feedback conditions. The fits of the models were also quite good ranging from 75% - 85% of responses accounted for. Working Memory, Delayed Feedback and Category Learning 14 for 500ms, followed by a 1000ms blank screen, and then a single “probe” digit is displayed, and the participant must decide whether the probe item was or was not in the memory set. Following the memory scanning response is a short inter-trial-interval and the initiation of the next trial. In Experiment 4 we had some participants learn the Experiment 2, 4UD category structures using this sequential task procedure, and other participants learn the Experiment 3, 4CJ category structures using this sequential task procedure. Because we hypothesize that learning of the 4UD category structure is more working memory demanding than learning the 4CJ category structure, we predict more interference from the secondary working memory task on 4UD than on 4CJ category learning. Method Participants and Design. Fifty participants (25 in each condition) were solicited from the University of Texas community and received course credit for participation. No participant completed more than one experimental condition. All participants were tested for 20/20 vision using a Snelling eye chart. Each participant completed 1 session of approximately 60 minutes duration. Stimuli and Stimulus Generation. Category Learning. These were identical to those outlined in Experiments 2 and 3. Memory Scanning. On each trial four digits were sampled randomly (without replacement) from the set of single digit numbers, 0 – 9. The four selected digits were displayed for 500ms in 48 point font in a horizontal array each separated by 100 pixels and were vertically centered on the screen. A blank screen was then displayed for 1000ms. Next a single digit was sampled randomly with .5 probability of being sampled from the memory set. The selected digit was displayed centered on the screen along with the question, “Was this item in the memory set?” The observer then responded “yes” or “no” by pressing one of two keys that were different from those used for categorization. Procedure. The procedure was identical to that from Experiments 2 and 3 with the following additions. For memory scanning the observers were informed that high levels of performance were possible and that they should respond as quickly and accurately as possible. If performance in the memory scanning task was below 90% accuracy at the end of any trial, then the observers were told to increase their memory scanning accuracy. These notifications stopped once memory scanning accuracy was above 90%. Results and Theoretical Analysis ANOVA Results Memory Scanning. Participants performed the memory-scanning task with high accuracy, achieving an overall accuracy level of 96.9%. There was no significant difference in memoryscanning accuracy across the 4UD (96.6%) and 4CJ (97.1%) conditions [t(48) = .84, p > .40]. Mean correct RT in the memory-scanning task was 1434ms. There was no significant difference in memory-scanning mean RT across the 4UD (1453ms) and 4CJ (1415ms) conditions [t(48) = .73, p > .50] Category Learning Performance. Recall that we successfully equated 4UD and 4CJ immediate feedback performance (see results from Experiment 3). Thus, we can compare 4UD and 4CJ performance in the sequential task procedure directly to determine which category structure is more difficult to learn when the feedback is followed immediately by a working memory demanding task and by extension which category structure places a greater demand on working memory during learning. A 2 category structure (4UD vs 4CJ) x 4 block mixed design ANOVA was conducted on the accuracy rates. The accuracy rates averaged across participants for the immediate and delayed feedback conditions are displayed in Figure 8. The main effects of category structure [F(1, 48) = 5.48, p < .05, MSE = .064] and block [F(3,144) = 38.70, p < .001, MSE = .008] were significant and the interaction [F(3, 144) = 1.68, p = .173, MSE = .008] was non-significant. The main effect of category structure suggested that 4UD category learning Working Memory, Delayed Feedback and Category Learning 15 (36%) was significantly worse than 4CJ category learning (45%). This finding supports our claim that the working memory demand associated with learning three uni-dimensional decision criteria (4UD) is greater than the working memory demand associated with learning 2 decision criteria along one dimension and a third along a second dimension (4CJ), and most importantly, provides strong support for the hypothesis that the working memory demand associated with learning a rule-based task (and not the absolute number of decision criteria to be learned) mediates the effects of delayed feedback. INSERT FIGURE 8 ABOUT HERE General Discussion This article reports the results from four experiments that tested the hypothesis that the working memory demand associated with learning a rule-based categorization problem mediates the effects of delayed feedback on category learning. If rule-based category learning is mediated by a hypothesis-testing system that uses working memory and executive attention then a close temporal correspondence between the perceptual signal and the feedback should not always be required for accurate learning. Working memory and attentional processes can be invoked to maintain an accurate representation of the stimulus and the location of the decision criteria during a delay period and performance should not be affected. However, the working memory demand associated with accurately maintaining this stimulus/criterion representation during a delay period might exceed working memory capacity leading to a rule-based category learning deficit. In this article we examined the interplay between the number of decision criteria relevant to solving the task, the number of dimensions relevant to solving the task and their relation to working memory demand. We build upon a large body of work in absolute identification that suggests that, all else equal, multidimensional stimulus identification places less demand on working memory than unidimensional stimulus identification, and extend this to category learning. In the current report, a five-second delay had no effect on rule-based category learning that involved two decision criteria along a single dimension (Experiment 1), but led to a significant slowing of rule-based category learning that involved three decision criteria along a single dimension (Experiment 2). Model-based analyses suggested that this slowing was due primarily to sub-optimal decision criterion learning and a slight increase in the variability in the application of the learned criteria. Experiment 3 was conducted to determine whether the absolute number of decision criteria to be learned or working memory load associated with learning the task was the critical factor for rule-based category learning. Delayed feedback had no effect on rule-based category learning when the three criteria were distributed across two dimensions. This result suggests that findings from the 1950’s and 1960’s on capacity limitations in absolute identification of uni-dimensional and multidimensional stimuli generalize to rule-based category learning. More importantly, this provides evidence, albeit indirect, in support of our hypothesis that the working memory load associated with solving the task mediates the effects of delayed feedback with a lower working memory demand being associated with learning two criteria along a single dimension and a third criterion along a second dimension. Experiment 4 provided a more direct test by borrowing a sequential task approach developed by Maddox et al. (2004). On each trial in Experiment 4 the participant categorized, received 500ms of feedback then completed one trial in a four-item memory scanning task. Having the working memory demanding memory scanning task immediately following categorization feedback has been found to affect rule-based category learning but not information-integration category learning (Maddox, et al, 2004). To determine more directly whether learning three criteria along a single dimension does in fact require more working memory than learning two criteria along one dimension and a third along a second dimension, participants learned each of these category structures using the Maddox et al. (2004) procedure. The category structure that requires more working memory to solve should show a greater interference from the secondary task. As expected, the secondary task led to greater interference when the participant was required to learn three decision criteria along a single dimension rather than learn three criteria spread across two dimensions. Working Memory, Delayed Feedback and Category Learning 16 Delayed Feedback and Decision Criterion Learning These data suggest that delayed feedback can affect rule-based category learning, but only when the working memory demand associated with holding a stimulus/criterion representation in memory during the delay period exceeds working memory capacity. We found that the need to learn three decision criteria along the spatial frequency dimension of a Gabor stimulus was enough to slow learning under delayed feedback conditions. It is important to be clear that we are not arguing that there is something “universal” about three decision criteria. In fact, we would argue that there are a number of factors that will influence when delayed feedback will affect rule-based category learning. The duration of the feedback delay should have an effect. If the delay had been increased to 10 or 20 seconds, we might have seen an effect of delay when only two decision criteria were necessary. The nature of the stimulus dimensions might also have an effect. Spatial frequency and orientation are separable (Garner, 1974; Maddox, 1992; Shepard, 1964) in the sense that they are processed more independently than integral dimension stimuli. Decision criterion learning might show a larger delayed feedback effect if processing of the dimensions overlaps some. The separation between the decision criteria might also have an effect. If the criteria were closer together along the stimulus dimension we might have seen an effect of delay sooner than if the criteria were farther apart. Finally, the number of irrelevant dimensions present in the stimulus might also have an effect. With more dimensions, many of which might be irrelevant, the participant must learn to ignore many of these dimensions likely taxing working memory and degrading the perceptual representation of the stimulus and criteria. All of these comments apply to continuous valued perceptual stimuli like those used in the current studies, but they are relevant to other types of stimuli used in the literature such as novel animals that vary along multiple (up to 5 or 6) highly discriminable binary-valued dimensions (e.g., straight vs. curly tail; black vs. white body, etc; Allen & Brooks, 1991). If the number of binary decisions required to solve the task is small (around 2) then it is likely that delayed feedback will not affect learning. On the other hand, as the number of binary decisions increases, or if a large number of irrelevant dimensions are present then delayed feedback might affect rule-based category learning. Of course, if the dimensions begin to take on more than two values (e.g., black, dark gray, light gray or white body) and each is associated with a unique category then delayed feedback effects might emerge. Highly discriminable binary valued stimuli can be thought of as falling on one end of a continuum with perceptually similar continuous valued dimensions on the opposite end. Future work should attempt to bridge the gap between these two clearly related literatures by seeking a convergence (or divergence) of findings across these different types of stimulus sets. Despite these qualifications on the current results, and the recommendation to expand the stimulus set, what is most important is that these studies provide important insights into the basic qualitative properties of the hypothesis-testing system and suggest strongly that the working memory demand involved in learning the task mediates whether delayed feedback will adversely affect rule-based category learning. Two Alternative Explanations COVIS assumes that rule-based category learning is mediated by an explicit, hypothesistesting system that involves working memory and attentional processes, and requires effort to process the feedback; whereas information-integration category learning is mediated by an implicit, procedural-learning based system that involves an essentially automatic (under optimal feedback conditions) gradual incremental learning process. Delayed feedback is assumed to adversely affect information-integration learning under all conditions. In the current report we test the hypothesis that rule-based category learning will be adversely affected only when the working memory demand associated with learning the task exceeds the working memory capacity needed to hold the stimulus/criterion representation in working memory during the delay period. An alternative (although related) explanation is that participants are not applying information-integration strategies in the information-integration conditions (immediate and Working Memory, Delayed Feedback and Category Learning 17 delayed feedback) but instead are using complex hypothesis-testing strategies that involve a number of decision criteria. As the number of these decision criteria increases a model of this sort becomes difficult to tease apart from an information-integration model and so this hypothesis is difficult to test through model-based analyses. On the other hand, the results from two recent studies argue strongly against this alternative hypothesis. In one study, Waldron and Ashby (2001; see also Zeithamova & Maddox, in press) asked participants to learn rule-based or information-integration categories. Participants were either placed into a control condition or a dual task condition. In the dual task condition, participants had to perform a numerical Stroop task, known to tax working memory and attentional processes, concurrently with the category learning task. In the control condition, the numerical Stroop task was absent. In the second study, Maddox et al., (2004) had participants categorize, receive feedback, then immediately perform four-item memory scanning. This is the procedure using in Experiment 4. If rule-based category learning requires working memory but information-integration category learning does not, then rule-based category learning should be affected by the dual task and the sequential memory scanning task whereas information-integration category learning should not. On the other hand, if information-integration category learning involves the application of a complex hypothesistesting strategy, then the dual task and sequential task should affect both rule-based and information-integration category learning. In both studies rule-based, but not informationintegration category learning was affected by the presence of the secondary task ruling out this alternative hypothesis. We argue that the effect of delayed feedback on the two systems is qualitatively different. In the procedural-based learning system, delayed feedback disrupts the close temporal correspondence needed between stimulus, response and feedback for good learning to occur. In the hypothesis-testing system, on the other hand, delayed feedback can affect learning if the working memory demand associated with holding an accurate stimulus/criterion representation in memory is high enough that it exceeds working memory capacity. An alternative explanation is that there are separate systems, but that the effect of delayed feedback in on a common perceptual working memory representation in frontal cortex that precedes the two systems. This alternative assumes that the reason that rule-based and information-integration tasks are affected differentially is because information-integration category structures require “more” information than rule-based category structures. This alternative explanation accounts for the shift from an information-integration strategy to a hypothesis-testing strategy in Experiments 1 and 2 because the shift in strategy reduces the load on working memory by focusing working memory on a single stimulus dimension instead of both. However, this alternative appears to break down when applied to the four-category conjunctive rule-based task in Experiment 3. In this case both dimensions are relevant and thus the amount of information needed (both dimensions) to solve the task is the same as that in the information-integration conditions from Experiments 1 and 2 yet delayed feedback had no effect in Experiment 3. Even so, a few theories have been proposed that assume that participants break down multi-dimensional rule-based category structures into simpler one-dimensional category structures (e.g., Erickson & Kruschke, 1998; Kruschke & Johnansen, 1999). It could be that one of the advantages of breaking down the task in this way is to reduce the load on working memory. This alternative seems to break down even further, though, since it would seem to predict a smaller effect of delayed feedback in the four-category unidimensional rule-based task than in the four-category conjunctive rule-based task since “less” information is needed in the former case. Although we are not ready to completely rule out this interesting alternative explanation of our results, the body of evidence in this report and a number of related studies (reviewed in Maddox & Ashby, 2004) seems to argue against it. Summary This article reports the results from four experiments that tested the hypothesis that the working memory demand associated with holding an accurate stimulus/criterion representation in memory during a delay period mediates the effects of delayed feedback, on rule-based category Working Memory, Delayed Feedback and Category Learning 18 learning. The interplay among the number of criteria to be learned, the number of stimulus dimensions relevant to solving the task and their relationship to working memory demand was examined. Delayed feedback had no effect on rule-based category learning when two decision criteria along a single dimension had to be learned, but adversely affected learning when three decision criteria along a single dimension had to be learned. Model-based analyses suggested that the performance deficit was due to sub-optimal decision criterion learning and to more variability in the application of the learned decision criteria. A critical experiment was conducted to determine whether the absolute number of decision criteria to be learned (three) or the fact that three decision criteria along a single dimension requires more working memory to learn was the operative factor leading to a delayed feedback deficit. The results suggested that three decision criteria can be learned accurately under delayed feedback conditions if two are associated with one stimulus dimension and the third with a second stimulus dimension. A more direct test of the hypothesis that three uni-dimensional decision criteria require more working memory to learn than two along one dimension and one along the second was undertaken in Experiment 4 using a sequential working memory task procedure. As predicted, the sequential task led to greater category learning interference when three uni-dimensional decision criteria were required. References Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716-723. Alexander, G.E., DeLong, M.R., & Strick, P.L. (1986). Parallel organization of functionally segregated circuits linking basal ganglia and cortex. Annual Review of Neuroscience, 9, 357-381. Allen, S.W, Brooks, L. R. (1991) Specializing the operation of an explicit rule. Journal of Experimental Psychology: General, 120, 3-19. Arbuthnott, G. W., Ingham, C. A., & Wickens, J. R. (2000). Dopamine and synaptic plasticity in the neostriatum. Journal of Anatomy, 196, 587-96. Ashby, F. G. (1992a). Multidimensional models of categorization. In F. G. Ashby (Ed.), Multidimensional models of perception and cognition. Hillsdale, NJ: Erlbaum. Ashby, F. G. (1992b). Multivariate probability distributions. In F. G. Ashby (Ed.), Multidimensional models of perception and cognition (pp. 1-34). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. Ashby, F. G., Alfonso-Reese, L. A., Turken, A. U., & Waldron, E. M. (1998). A neuropsychological theory of multiple systems in category learning. Psychological Review, 105, 442-481. Ashby, F. G., & Ell, S. W. (2001). The neurobiological basis of category learning. Trends in Cognitive Sciences, 5, 204-210. Ashby, F. G. & Gott, R. E. (1988). Decision rules in the perception and categorization of multidimensional stimuli. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14, 33-53. Ashby, F. G., & Lee, W. W. (1993). Perceptual variability as a fundamental axiom of perceptual science. In S. C. Masin (Ed.), Foundations of perceptual theory (pp. 369-399). Amsterdam: Elsevier. Ashby, F.G. & Spiering, B.J. (2004). The Neurobiology of Category Learning. Behavior and Cognitive Neuroscience Reviews, 3, 101-113. Ashby, F. G. & Townsend, J. T. (1986). Varieties of perceptual independence. Psychological Review, 93, 154-179. Ashby, F. G. & Waldron, E. M. (1999). The nature of implicit categorization. Psychonomic Bulletin & Review, 6, 363-378. Working Memory, Delayed Feedback and Category Learning 19 Ashby, F.G., Waldron, E.M., Lee, W.W., & Berkman, A. (2001). Suboptimality in categorization and identification, Journal of Experimental Psychology: General, 130, 77-96. Attneave, F. (1959). Applications of information theory to psychology. New York: Holt, Rinehart & Winston. Brainard, D.H. (1997). Psychophysics software for use with MATLAB. Spatial Vision, 10, 433-436. Erickson, M. A., & Kruschke, J. K. (1998). Rules and exemplars in category learning. Journal of Experimental Psychology: General, 127, 107-140. Filoteo, J.V., Maddox, W.T., & Davis, J. D. (2001a). A possible role of the striatum in linear and nonlinear category learning: Evidence from patients with Huntington’s disease. Behavioral Neuroscience, 115, 786-798. Filoteo, J. V., Maddox, T. W., & Davis, J. D. (2001b). Quantitative modeling of category learning in amnesic patients. Journal of the International Neuropsychological Society, 7, 1-19. Filoteo, J.V., Maddox, W.T., Simmons, A.N., Ing, A.D., Cagigas, X.E., Matthews, S. & Paulus, M. P., (in press). Cortical and subcortical brain regions involved in rule-based category learning, Neuroreport. Folstein, J.R. & Van Petten, C.V. (2004). Journal of experimental psychology: Learning, memory and cognition, 30(5), 1026-1044 Garner, W.R. (1962). Uncertainty and structure as psychological concepts. New York: Wiley. Garner, W.R. (1974). The processing of information and structure. New York: Wiley. Green, D. M., & Swets, J. A. (1966). Signal detection theory and psychophysics. New York: John Wiley & Sons. Hollerman, J. R. & Schultz, W. (1997). Dopamine neurons report an error in the temporal prediction of reward during learning. Nature Neuroscience, 1, 304-308. Kemler Nelson, D.G. (1984). The effect of intention on what concepts are acquired. Journal of verbal learning and verbal behavior, 10, 734-759 Keri, S. (2003). The cognitive neuroscience of category learning. Brain Research Reviews, 43, 85-109. Kerr, J.N.D., & Wickens, J.R. (2001). Dopamine D-1/D-5 receptor activation is required for long-term potentiation in the rat neostriatum in vitro. Journal of Neurophysiology, 85, 117124. Kruschke, J. K., & Johansen, M. K. (1999). A Model of Probabilistic Category Learning. Journal of Experimental Psychology: Learning, Memory and Cognition, 25, 1083-1119. Maddox, W.T. (1992). Perceptual and decisional separability. In F.G. Ashby (Ed.), Multidimensional models of perception and cognition (pp. 147-180). Hillsdale, NJ: Erlbaum. Maddox, W.T. (2001). Separating perceptual processes from decisional processes in identification and categorization. Perception & Psychophysics, 63, 1183-1200. Maddox, W.T. (2002) Learning and attention in multidimensional identification, and categorization: Separating low-level perceptual processes and high level decisional processes. Journal of Experimental Psychology: Learning, Memory, & Cognition, 28, 99-115. Maddox, W. T. & Ashby, F. G. (1993). Comparing decision bound and exemplar models of categorization. Perception and Psychophysics, 53, 49-70. Maddox, W.T., Ashby, F.G. (2004) Dissociating Explicit and Procedural-Learning Based Systems of Perceptual Category Learning. Behavioral Processes. 66, 309-332. Maddox, W.T., Ashby, F.G., & Bohil, C.J. (2003). Delayed feedback effects on rulebased and information-integration category learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 650-662. Maddox, W.T., Ashby, F.G., Ing, A.D., Pickering, A.D. (2004) Disrupting feedback processing interferes with rule-based but not information integration category learning. Memory & Cognition, 32(4), 582-591 Working Memory, Delayed Feedback and Category Learning 20 Maddox, W.T., & Filoteo, J.V. (2001). Striatal contributions to category learning: Quantitative modeling of simple linear and complex nonlinear rule learning in patients with Parkinson’s Disease. Journal of the International Neuropsychological Society, 7, 710-727. Maddox, W.T., & Ing, A.D. (2005). Delayed feedback disrupts the procedural-learning but not the hypothesis-testing system in perceptual category learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 100-107. Maddox, W.T., & Filoteo, J.V. (in press). Modeling Visual Attention and Category Learning in Amnesiacs, Striatal-Damaged Patients, and Normal Aging To appear in R.W.J Neufeld (Ed.) Advances in Clinical-Cognitive Science: Formal Modeling and Assessment of Processes and Symptoms. McDonald, R.J. & White, N.M. (1993). A triple dissociation of memory systems: hippocampus, amygdala, and dorsal striatum. Behavioral Neuroscience, 107, 3-22 McDonald, R.J. & White, N.M. (1994). Parallel information processing in the water maze: evidence for independent memory systems involving dorsal striatum and hippocampus. Behavioral Neural Biology, 61, 260-70. Miller, G.A. (1956). The magical number seven plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81-97. Myers, C. E., Shohamy, D., Gluck, M. A., Grossman, S., Onlaor, S., & Kapur, N. (2003). Dissociating medial temporal and basal ganglia memory systems with a latent learning task. Neuropsychologia. Nosofsky, R. M., Palmeri, T. J., & McKinley, S. C. (1994). Rule-plus-exception model of classification learning. Psychological Review, 101, 53-79. Packard, M.G. & McGaugh, J.L. (1992). Double dissociation of fornix and caudate nucleus lesions on acquisition of two water maze tasks: further evidence for multiple memory systems. Behavioral Neuroscience, 106, 439-446. Poldrack, R.A., & Packard, M.G. (2003). Competition among multiple memory systems: converging evidence from animal and human brain studies. Neuropsychologia, 41, 245-251. Poldrack, R. A., Prabhakaran, V. Seger, C. A., & Gabrieli, J. D. E. (1999). Striatal activation during acquisition of a cognitive skill. Neuropsychology, 13, 564-574. Pollack, I. (1952). The information of elementary auditory displays. Journal of the Acoustical Society of America, 24, 745-749. Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 71, 59-108. Reber, P.J., Stark, C. E. L. & Squire, L.R. (1998). Cortical areas supporting category learning identified using functional magnetic resonance imaging. Proceedings of the National Academy of Sciences, USA, 95, 747-750. Regehr, G., & Brooks, L.R. (1993) Perceptual manifestations of an analytic structure: The priority of holistic individuation. Journal of Experimental Psychology: General, 122, 92114. Schacter, D.L. (1987). Implicit memory: History and current status. Journal of Experimental Psychology: Learning, Memory, and Cognition, 13, 501-518. Schultz, W. (1992). Activity of dopamine neurons in the behaving primate. Seminars in Neuroscience, 4, 129-138. Seger, C. A., & Cincotta, C. M. (2002). Striatal activation in concept learning. Cognitive, Affective, & Behavioral Neuroscience, 2, 149-161. Shepard, R.N. (1964). Attention and the metric structure of the stimulus space. Journal of Mathematical Psychology, 1, 54-87. Smith, E.E., Patalano, A.L., & Jonides, J. (1998). Alternative strategies of categorization. Cognition, 65(2-3), 167-196. Smith, J.D., & Kemler-Nelson, D.G. (1984). Overall similarity in adults’ classification: The child in all of us. Journal of Experimental Psychology: General, 113, 137-159. Working Memory, Delayed Feedback and Category Learning 21 Squire, L.R. (1992). Memory and the hippocampus: A synthesis from findings with rats, monkeys and humans. Psychological Review, 99, 195-231. Takane, Y., & Shibayama, T. (1992). Structures in stimulus identification data. In F. G. Ashby (Ed.), Multidimensional models of perception and cognition (pp. 335-362). Hillsdale, NJ: Erlbaum. Waldron, E. M. & Ashby, F. G. (2001). The effects of concurrent task interference on category learning. Psychonomic Bulletin & Review, 8, 168-176. Wickens, T. D. (1982). Models for behavior: Stochastic processes in psychology. San Francisco: W.H. Freeman. Wilson, C.J. (1995). The contribution of cortical neurons to the firing pattern of striatal spiny neurons. In J.C. Houk, J.L. Davis, D.G. Beiser (Eds.), Models of Information Processing in the Basal Ganglia (pp. 29-50). Cambridge: Bradford. Zeithamova, D., & Maddox, W.T. (in press). Dual task interference in perceptual category learning. Memory & Cognition. Table 1. Category Distribution Parameters for Experiment 1 ______________________________________________________________________________ Condition Category f o f2 o2 covf,o Experiment 1 3UD A B C 255 285 315 125 125 125 9 9 9 9000 0 9000 0 9000 0 3II A B C 177 259 345 252 166 84 4506 4506 4506 4506 4271 4506 4271 4506 4271 Table 2. Category Distribution Parameters for Experiment 2 ______________________________________________________________________________ Condition Category f o f2 o2 covf,o Experiment 1 4UD A B C D 255 285 315 345 125 125 125 125 9 9 9 9 9000 9000 9000 9000 0 0 0 0 4II A B C D 177 259 345 427 252 166 84 2 4506 4506 4506 4506 4506 4506 4506 4506 4271 4271 4271 4271 Working Memory, Delayed Feedback and Category Learning 22 Table 3. Category Distribution Parameters for Experiments 3 ______________________________________________________________________________ Condition Category f o f2 o2 covf,o 4CJ A1 A2 B C D1 D2 260 260 300 300 340 340 100 200 100 200 100 200 25 25 25 25 25 25 625 625 625 625 625 625 0 0 0 0 0 0 Working Memory, Delayed Feedback and Category Learning 23 Orientation (a) Frequency Orientation (b) Frequency (c) Figure 1. Category structures from Experiment 1. (a) three-category, uni-dimensional rule-based (3UD), and (b) three-category information-integration (3II). Each open circle denotes the spatial frequency and spatial orientation of a Gabor pattern from Category A. Each filled circle denotes a Gabor pattern from category B. Each open square denotes a Gabor pattern from category C. (c) Sample Gabor patch stimulus. Working Memory, Delayed Feedback and Category Learning 24 Proportion Correct (a) 1.00 0.80 0.60 3UD 0.40 3II 0.20 0.00 1 2 3 4 81-trial blocks (b) Proportion Correct 1.00 0.80 0.60 Immediate 0.40 Delay 0.20 0.00 1 2 3 4 81-trial blocks Proportion Correct (c) 1.00 0.80 0.60 Immediate 0.40 Delay 0.20 0.00 1 2 3 4 81-trial blocks Figure 2. Experiment 1 proportion correct for the (a) 3UD and 3II immediate feedback conditions, (b) 3UD delayed and immediate feedback conditions, and (c) 3II delayed and immediate feedback conditions. Standard error bars are also included. Proportion Best Fit by Each Model Type Working Memory, Delayed Feedback and Category Learning 25 1.00 70% 82% 0.80 0.60 Info-Int Hypo-Test 0.40 0.20 62% 52% Delay Immediate 0.00 Feedback Condition Figure 3. Experiment 1 proportion of 3II participant’s final block data that was best fit by either a hypothesis-testing or an information-integration model. The percentages embedded in the plot denote the average accuracy rates achieved by participants whose data was best fit by each model class. Working Memory, Delayed Feedback and Category Learning 26 Orientation (a) Frequency Orientation (b) Frequency Figure 4. Category structures from Experiment 2. (a) four-category, uni-dimensional rule-based (4UD), and (b) four-category information-integration (4II). Each open circle denotes the spatial frequency and spatial orientation of a Gabor pattern from Category A. Each filled circle denotes a Gabor pattern from category B. Each open square denotes a Gabor pattern from category C. Each filled square denotes a Gabor pattern from category D. Working Memory, Delayed Feedback and Category Learning 27 Proportion Correct (a) 1.00 0.80 0.60 4UD 0.40 4II 0.20 0.00 1 2 3 4 80-trial blocks (b) Proportion Correct 1.00 0.80 0.60 Immediate 0.40 Delay 0.20 0.00 1 2 3 4 80-trial blocks (c) Proportion Correct 1.00 0.80 0.60 Immediate 0.40 Delay 0.20 0.00 1 2 3 4 80-trial blocks Figure 5. Experiment 2 proportion correct for the (a) 4UD and 4II immediate feedback conditions, (b) 4UD delayed and immediate feedback conditions, and (c) 4II delayed and immediate feedback conditions. Standard error bars are also included. Proportion Best Fit by Each Model Type Working Memory, Delayed Feedback and Category Learning 28 1.00 49% 71% 0.80 0.60 Info-Int Hypo-Test 0.40 0.20 48% 61% 0.00 Delay Immediate Feedback Condition Figure 6. Experiment 2 proportion of 4II participant’s final block data that was best fit by either a hypothesis-testing or an information-integration model. The percentages embedded in the plot denote the average accuracy rates achieved by participants whose data was best fit by each model class. Working Memory, Delayed Feedback and Category Learning 29 Orientation (a) Frequency (b) Proportion Correct 1.00 0.80 0.60 Delay 0.40 Immediate 0.20 0.00 1 2 3 4 80-trial blocks Figure 7. (a) Category structures from Experiment 3. Each open circle denotes the spatial frequency and spatial orientation of a Gabor pattern from Category A. Each filled circle denotes a Gabor pattern from category B. Each open square denotes a Gabor pattern from category C. Each filled square denotes a Gabor pattern from category D. (b) Proportion correct for the delayed and immediate feedback conditions. Standard error bars are also included. Working Memory, Delayed Feedback and Category Learning 30 Proportion Correct 1.00 0.80 0.60 4CJ 4UD 0.40 0.20 0.00 1 2 3 4 80-trial blocks Figure 8. Proportion correct for the 4CJ and 4UD conditions from Experiment 4. Standard error bars are also included.