Lifelong Learning: Papers from the 2011 AAAI Workshop (WS-11-15) A Metacognitive Classifier Using a Hybrid ACT-R/Leabra Architecture Yury Vinokurov and Christian Lebiere Seth Herd and Randall O’Reilly Carnegie Mellon University Department of Psychology 5000 Forbes Ave Pittsburgh, PA 15212 University of Colorado, Boulder Department of Psychology Muenzinger D251C 345 UCB Boulder, CO 80309 motor control. Each functionality is represented in ACT-R by a module, and each module exposes a buffer to the central procedural module charged with coordinating the system. Execution of events in ACT-R is dictated by production rules, which fire when their antecedent conditions represented over the set of buffers are met. Rule consequents may be used to retrieve information from declarative memory, direct visual attention, or perform motor functions, among other actions specified through the relevant buffers. If more than a single rule’s antecedent conditions are met during the selection phase, the rule with the highest utility, as learned throughout the experiment, is selected. Abstract We present a metacognitive classifier implemented within a hybrid architecture that combines the strengths of two existing, mature cognitive architectures: ACT-R and Leabra. The classification of a set of items into previously seen and novel categories (TRAIN and TEST, respectively) is carried out in ACT-R using metacognitive signals supplied by Leabra. The resulting system performance is analyzed as a function of various architectural parameters, and future directions of research are discussed. Introduction Object recognition in dynamic environments is a major challenge for the current generation of classifiers. The major limitation to standard classification techniques is that the classifiers have to be trained on objects for which the ground truth, in terms of either a pre-assigned label or an error signal, is known. This limitation prevents the classifiers from dynamically developing their own categories of classification based on information obtained from the environment. Previous attempts to overcome these limitations have been based on classical machine learning algorithms (Modayil and Kuipers 2007) (Kuipers et al. 2006). Here we present an alternative approach to this problem, and develop the beginnings of a framework within which a classifier can evolve its own representations based on dynamical information from the world. This framework combines the perceptual strengths of traditional neural networks such as Leabra with the symbolic and sub-symbolic cognition model implemented in ACT-R to create a hybrid architecture intended to negotiate a complex dynamic environment. Declarative memory ACT-R contains a robust declarative memory module, which stores information as “chunks.” A chunk in ACT-R may contain any number of slots and values for those slots; slot values may be other chunks, numbers, strings, lists, or generally any data type allowed in Lisp (the base language for ACT-R). Retrieval from declarative memory is handled by a request to the retrieval module; the request specifies the conditions to be met in order for a chunk to be retrieved from declarative memory, and the module either returns a chunk matching those specifications or generates a failure signal if a retrieval cannot be made. The success of the retrieval procedure depends on the satisfaction of the chunk specification as well as the current activation of the chunk. In all cases, retrieving a chunk from declarative memory is controlled by its activation according to Ai = Bi + Si + Pi + i , where i is the index of the given chunk, and Bi , Si , Pi , and i represent the base-level activation, spreading activation, partial match activation, and noise, respectively. Neither base-level learning nor spreading activation are used in this model; therefore Bi = Si = 0. Several retrieval modalities exist in ACT-R: absolute matching, partial matching, and partial matching with blending. Absolute retrieval is the simplest case: if a chunk whose specifications exactly match those specifications passed to the declarative memory module as a request does not exist (or if its activation is too low to be retrieved), the retrieval fails. Partial matching and blending are discussed in the following sections. ACT-R ACT-R (Anderson and Lebiere 1998) (Anderson et al. 2004) (Anderson 2007) is a cognitive architecture based on the rational analysis theory developed by Anderson (Anderson and Lebiere 1990) and contains both symbolic and subsymbolic components. At the core of the ACT-R architecture is a modular organization which separates functional elements such as declarative memory, visual processing, and c 2011, Association for the Advancement of Artificial Copyright Intelligence (www.aaai.org). All rights reserved. 50 Partial Matching a consensus about the provided information. In the case of the ACT-R/Leabra hybrid, we use the combination of partial matching and blending to retrieve a judgment about whether a given item belongs to the TEST or TRAIN set; since the judgment is a continuous number from 0 to 1 (0 corresponding to TRAIN and 1 to TEST), such a retrieval provides not only the judgment itself but the confidence associated with the judgment. This approach has been used in a wide range range of applications, including game playing (Sanner et al. 2000), scheduling (Gonzalez, Lerch, and Lebiere 2003), decision-making (Gonzalez and Lebiere 2005) and landmine detection (Lebiere and Staszewski 2010). In the case of many modeling tasks, especially those involving continuous variables, the exact chunk specified in the module request is unlikely to exist in declarative memory. In such situations, what is wanted is the “closest” chunk available. To this end, ACT-R implements a partial matching retrieval mechanism. This mechanism provides a hook into a similarity metric for comparing the value present in a chunk slot with the value requested by the pattern matcher. This metric is scaled between 0 and -1, with 0 corresponding to a perfect match and -1 corresponding to the highest mismatch possible. This partial matching metric gives the partial matching activation as described in the preceding section according to the formula Pi = k P Mik , where P is the mismatch penalty which weights the similarity and Mik is the similarity metric between the value k in the retrieval specification and the value in the corresponding slot of chunk i. In the hybrid ACT-R/Leabra model, partial matching is employed on the parameters provided by Leabra to retrieve instances of previously known and unknown items. Leabra Leabra is a set of algorithms for simulating neural networks (O’Reilly and Munakata 2000). These algorithms serve as a computational theory of brain (primarily cortical) function, as the algorithms and their parameterization have been closely guided by a wide variety of data. Each model that uses the Leabra framework is independent, but the sharing of common principles and parameters across many networks make the Leabra framework an integrative theory of brain (and cognitive) function, just as ACT-R is an integrative theory of cognitive (and brain) function. Leabra operates at an intermediate level of detail (e.g., firing-rate coded, point neurons) to allow modeling effects both down to the molecular effects (e.g., increased sodium conductances from pharmacological manipulation) and up to the cognitive level (e.g., differentially worse performance in the incongruent condition of the Stroop task). Leabra simulations have explored numerous neuroscientific processes including cognitive control (Herd and O’Reilly 2006) (O’Reilly and Pauli 2010), visual search (Herd and O’Reilly 2005), self-directed learning (Herd, Mingus, and O’Reilly 2010), working memory and executive function in the prefrontal cortex (Hazy, Frank, and O’Reilly 2006), language processing and selection (Snyder et al. 2010), and cortical learning (O’Reilly and Munakata 2002). Here we used the LVIS (Leabra Vision) model (O’Reilly et al. 2011). This model uses the Leabra algorithm and the known convergent hierarchical structure of the human visual system. A V1 layer is activated according to a set of filters for line/edge orientation, length sums, and end stopping. This layer is sparsely connected in a convergent manner to a V4 layer, which in turn is similarly connected to an IT layer. Connections from V1 to V4, V4 to IT, and IT to output are allowed to learn using Leabra’s combination of self-organizing and biologically-realistic error driven learning. The model learns to categorize 100 types of objects, with a maximum generalization performance of about 93% in categorizing entirely new object models into learned categories. For the present experiments, it was trained on 50 of those object classes (the TRAIN items in the hybrid model), with the remaining 50 reserved as novel (TEST) items. Blending Sometimes even the closest matched chunk will not do; what we would like in such a situation is a “consensus” chunk whose values are the weighted averages of all chunks which (partially) match the request specification. This is implemented in ACT-R via the blending mechanism. In a blended retrieval, the activation of every matching chunk is calculated; then, each slot is assigned a value which is computed from the values of the corresponding slots of the matching chunks by an average weighted by the activation. For continuous numbers, this produces the standard weighted average of a series of values. For discrete values such as strings or other chunks, the retrieved chunk will contain the “winning” value; that is, the value that receives the most “votes” out of the matching chunks, as weighted by the activation. Formally, the value returned by the blending retrieval is given by Eq. 1: V = min Pi (1 − Sim(V, Vi )) 2 (1) i where eMi /t Pi = M /t j je (2) is the probability of retrieving the ith chunk as a function of match score Mi , Sim(V, Vi ) is the similarity between the retrieved value V and the actual value Vi returned by the ith chunk, and t is the “temperature” of the Boltzmann distribution in Eq. 2 which corresponds to the noise. The combination of partial matching and blending is an extremely powerful mechanism in ACT-R, as partial matching allows for the factoring of semantic similarity on the conditions expressed on the retrieval while blending reflects those similarities in the values resulting from the retrieval. For example, it may be used to generate hypotheses based on 51 The ACT-R/Leabra Hybrid As an architecture that combines symbolic and subsymbolic representation, ACT-R is very well suited to modeling aspects of high-level cognition such as decisionmaking, control, and memory storage and recall (for a full list of applications of ACT-R, see http://actr.psy.cmu.edu/publications/index.php). On the other hand, Leabra includes a detailed account of bottom-up perception, which ACT-R currently does not. This suggests a natural synergy between the two architectures, which combines Leabra’s perception capabilities with ACT-R’s control and memory functionalities. This approach had already been implemented to some extent in the SAL architecture described in (Jilk et al. 2008), in which an ACT-R/Leabra hybrid model was embedded in an Unreal Tournament world. The current hybrid architecture improves on SAL by providing a tight and natural integration between the two systems. This is done by wrapping the interaction with Leabra inside of an ACT-R module, which exposes a buffer called leabra-visual to the ACT-R system. This buffer provides a way of issuing requests to the Leabra visual module and retrieving data from it. Communication with Leabra is handled via sockets, with Leabra acting as the server and ACTR as the client. If the request is one that returns data from Leabra, that data is converted into a chunk which can then be accessed through the leabra-visual buffer. In this way, the Leabra vision module is exactly analogous to the standard ACT-R visual module. In the current implementation of the hybrid architecture, the interaction between ACT-R and Leabra is limited to commands from ACT-R that direct Leabra’s attention to various objects, and object identification data that flows from Leabra to ACT-R and is encoded as chunks. However, the modular nature of both architectures is such that once the basic framework has been set up, the functionality may be extended indefinitely. (a) Visual chunk (b) Context chunk Figure 1: A schematic representation of the visual chunk obtained from Leabra and the context chunk generated by the blending retrieval. The “data” slot in 1(b) actually represents several slots; in this case, the data is identical to the out max act and it avg act parameters. its attention towards a different view of the item at which Leabra is currently “looking.” Whether or not the ACT-R model chooses to do so depends on whether its certainty in its current judgment is sufficient to make a decision; the threshold for decision-making is simply a parameter. If the model surpasses the certainty threshold, it will make a judgment about the item and direct visual attention to a new item. If the threshold is not surpassed, another view of the item is requested, and this continues until some predetermined number of views has been processed (in our case, 20), at which point the results of the last view are declared to be the model’s judgment irrespective of threshold. The data to the ACT-R model is supplied via chunks in the leabra-visual buffer and its output is another chunk whose values are generated by the blending mechanism. The chunks themselves are simply Lisp lists; a slot-value pairing is a value in a particular place in the list. The schematic structure of the chunks is shown in Fig. 1. The visual chunk simply contains the metacognitive signal from Leabra, as well as some additional information like the object name, which is not used by the classifier. The context chunk is built up in ACT-R’s imaginal buffer; here, the “data” slot represents the metacognitive signal from Leabra (which may include several slot-value pairs) and the “expectation” slot is the previous judgment of the classifier. A retrieved context chunk will also contain an “outcome” value, which represents the classifier’s current judgment about the item category. The Serial Classifier The serial classifier is an instance-based model (Taatgen, Lebiere, and Anderson 2006) of object classification into two categories. The task of the serial classifier implemented using the ACT-R/Leabra hybrid architecture is to sort input items into ones that the Leabra network has either been trained on or not (referred to as TRAIN and TEST categories). For our purposes, the two categories are coded as 0 and 1, respectively. A category judgment as made by ACTR may lie anywhere on that continuum. Furthermore, the value of the judgment provides a natural estimate of its certainty, which for a judgment with some value j ∈ [0, 1] is defined as = abs(j − 0.5). Thus, the certainty has values ∈ [0, 0.5]. The ACT-R model requests a visual chunk from Leabra, which contains not only the object identity but also metacognitive information pertaining to the internals of the Leabra network, as well as the ground truth about an item’s category. The serial nature of the classifier lies in its ability to request additional information about a given item and revise its judgment accordingly. The ACT-R model may either direct Leabra’s attention towards a new item, or it may direct Metacognitive Parameters The classifier makes its decisions based on partially matching against the parameters provided by Leabra. Through a feature selection analysis, it was determined that the average activation of Leabra’s IT layer and the maximum activation of the output layer are the two most significant tracers of novelty. Although these parameters are referred to as the “metacognitive” parameters, it should be clear that they are only used in a metacognitive context by ACT-R; within the Leabra model itself, these parameters simply indicate a goodness-of-fit to the learned weights. Matching on the other parameters supplied by Leabra did not produce any difference in the results. 52 Figure 2: Plots of threshold vs. performance for every element in the space defined by the match penalty (mp) and noise (ans). Training Data judgment may surpass it, and the model will go on to view the next item. If the threshold is not surpassed initially, the model will request another view. The same metacognitive parameters as before are used in the partial matching specification; however, the specification will now include the previous expectation. Thus, if the training set contains many instances of situations in which a given expectation led to a given final judgment, the blending mechanism will bias the next iteration of the expectation towards that result. For example, if the item is a TEST (meaning that the ground truth is 1) and the current expectation is 0.6, partially matching with blending on that expectation value might produce a revised expectation of 0.7 on the next iteration. Then, the value of 0.7 would be used in the next partial matching retrieval, if one is necessary. This process continues until the threshold parameter is surpassed or 20 views of the item have been given. A representative sequence of judgments can be seen in Fig. 3. The training for the classifier consists of training the ACTR side of the hybrid by learning chunks that represent TEST and TRAIN items. This is separate from the process of training the LVIS model to recognize images. The training data for the classifier consists of 200 items, selected from the TRAIN or TEST set with equal probability of 12 ; thus, the final training set consists of roughly a 50/50 mixture of TEST and TRAIN items. Each item is viewed 20 times. For each item, the recorded data consists of the metacognitive parameters, the expectation, and the ground truth. The expectation is initially generated randomly and adjusted on each view based on a simple thresholding of the maximum activation of the output layer. If the activation is ≥ 0.5 the expectation is revised upward (closer to 1) by 0.05, while if it is ≤ 0.5 it is revised downward (closer to 0) by 0.05. For most items, the expectation quickly rails to one of the two thresholds, though for some items, 20 views are not enough to reach either limit. Judgment Revision Classifier Performance The revision of judgments is a key feature of the serial classifier and is implemented using partial matching and blending. Recall that this combination effectively produces a consensus chunk whose slots are filled with values obtained via weighted average from the corresponding slots of the matching chunks in declarative memory. The first time the classifier views a particular item, it will attempt to retrieve such a consensus chunk based purely on the metacognitive parameters supplied by Leabra. This in turn generates a judgment between 0 and 1 based on the instances in the model’s training set. If the threshold for certainty is sufficiently low, the Parameter Space Exploration On the ACT-R side, multiple model parameters affect the retrieval process. The chief of these are the mismatch penalty and the noise. The mismatch penalty is effectively the penalty incurred by chunks depending on how well they partially match a given chunk specification. The noise is sampled from a Gaussian distribution and added to the activation of a given chunk. In this way, the mismatch penalty and the noise counterbalance each other: the higher the mis- 53 Figure 4: Plot of threshold vs. performance for mp = 10 and ans = 0.1. Each data point represents an average over 10 runs. A threshold of 0.25 achieves the best performance by correctly classifying objects 76.8% of the time. Figure 3: A sequence of judgments undertaken by the categorizer. The certainty cutoff threshold is 0.4; the model acquires different views of the object and successively refines its judgments until it converges on the correct answer with the requisite certainty. match penalty, the smaller the neighborhood in parameter space into which chunks must fall in order to be considered a match. On the other hand, the higher the noise, the more “smeared” the boundaries of that neighborhood become, as chunks which are outside the matching neighborhood may fall into it once the noise is added, and conversely, chunks within the neighborhood may fall out of it. A parameter exploration was carried out for 5 different values of the mismatch penalty (mp) and 5 different values of noise (ans). Additionally, for each combination in the (mp, ans) space, a series of decision thresholds were used. Fig. 2 shows the performance of the classifier in terms of percent correct as a function of threshold for each point in this space. In the upper right corner of the space (high noise, low mismatch penalty), performance degrades, as many chunks which would not otherwise be candidates for matching fall into an already too permissive neighborhood in the match space. However, once the ratio of noise to mismatch penalty reaches approximately 10, the system becomes relatively insensitive to changes in both the noise and the mismatch penalty. This suggests that optimal performance should occur in a configuration with high match penalties and low noise values. Further simulations were conducted by fixing the mismatch penalty at 10 and the noise parameter at 0.1. Conclusions and Further Work We demonstrate an implementation of a serial classifier realized through the integration of two rather different cognitive architectures: the primarily symbolic and sub-symbolic ACT-R, and the connectionist Leabra. The metacognitive parameters generated by Leabra are used by ACT-R to classify objects into one of two distinct categories on the basis of past instances on which it has been trained. The metacognitive signals needed to perform this classification to ∼ 77% accuracy (at a threshold value 0.25) are simply the average activation of the IT layer and the maximum activation of the output layer. Further work remains to be done with this model. One interesting question concerns the kinds of objects that are not classified correctly by this model; an analysis on common features shared by such objects has yet to be performed. Another area of research would be the identification of better metacognitive signals within Leabra. Also, at this time, the model is not constrained with regard to how many views it may request, other than the hard cut-off of 20, implemented to make sure the model terminates. In other words, there is no extra cost built into the model associated with the acquisition of new information. Real situations, however, do have such a cost in terms of time. Thus, an additional timing/cost constraint could be placed on the model to examine its performance under time pressures. The present model is passive; it does not interact with the world in any way. A further avenue of research would be to close the loop between the world and the model by allowing the model to manipulate the world. In this way, the model could learn not only representations of the world, but also which actions allow it to achieve its goals within that world. This would represent a crucial step towards the realization of self-directed learning, as described in (Herd, Mingus, and Average Performance as a Function of Threshold With the mismatch penalty fixed at 10 and the noise parameter fixed at 0.1, 10 simulations were run with the same training set for threshold values of 0.1, 0.2, 0.225, 0.25, 0.275, 0.3, and 0.4. The threshold space is sampled more heavily around 0.25 because, as demonstrated in Fig. 2, the optimal threshold value lies within that range for most cases. The results of the simulation are displayed in Fig. 4. 54 O’Reilly 2010). O’Reilly, R. C., and Munakata, Y. 2000. Computational Explorations in Cognitive Neuroscience: Understanding the Mind by Simulating the Brain. MIT Press. O’Reilly, R. C., and Munakata, Y. 2002. Psychological function in computational models of neural networks. In Michela Gallagher, Randy J. Nelson, I. B. W., ed., Handbook of Psychology, volume 3. Wiley. chapter 22. O’Reilly, R.C.; Herd, S., and Pauli, W. 2010. Computational models of cognitive control. Current opinion in neurobiology. O’Reilly, R. C.; Herd, S. A.; Wyatt, D.; Mingus, B.; and Jilk, D. 2011. Bidirectional biological object recognition. submitted. Sanner, S.; Anderson, J. R.; Lebiere, C.; and Lovett, M. 2000. Achieving efficient and cognitively plausible learning in backgammon. In Proceedings of the Seventeenth International Conference on Machine Learning. Snyder, H. R.; Hutchison, N.; Nyhus, E.; Curran, T.; Banich, M. T.; O’Reilly, R. C.; and Munakata, Y. 2010. Neural inhibition enables selection during language processing. Proceedings of the National Academy of Sciences 107(38):16483–16488. Taatgen, N.; Lebiere, C.; and Anderson, J. R. 2006. Modeling paradigms in act-r. In Sun, R., ed., Cognition and MultiAgent Interaction: From Cognitive Modeling to Social Simulation. Cambridge University Press. Acknowledgments This work was conducted through collaborative participation in the Robotics Consortium sponsored by the U.S Army Research Laboratory under the Collaborative Technology Alliance Program, Cooperative Agreement W911NF-10-20016. References Anderson, J. R., and Lebiere, C. 1990. The Adaptive Character of Thought. Hillsdale, New Jersey: Erlbaum. Anderson, J. R., and Lebiere, C. 1998. The Atomic Components of Thought. Mahwah, New Jersey: Erlbaum. Anderson, J. R.; Bothell, D.; Byrne, M. D.; Douglass, S.; Lebiere, C.; and Qin, Y. 2004. An integrated theory of mind. Psychological Review 111(4):1036–1060. Anderson, J. R. 2007. How Can the Human Mind Occur in the Physical Universe? Oxford University Press. Gonzalez, C., and Lebiere, C. 2005. Instance-based cognitive models of decision making. In Zizzo, D., and Courakis, A., eds., Transfer of knowledge in economic decision making. New York: Palgrave McMillan. Gonzalez, C.; Lerch, F. J.; and Lebiere, C. 2003. Instancebased learning in dynamic decision making. Cognitive Science 27(4):591–635. Hazy, T.; Frank, M.; and O’Reilly, R. 2006. Banishing the homunculus: Making working memory work. Neuroscience 139(1):105 – 118. Herd, S., and O’Reilly, R. 2005. Serial visual search from a parallel model. Vision Research. Herd, S.A.; Banich, M., and O’Reilly, R. 2006. Neural mechanisms of cognitive control: An integrative model of stroop task performance and fmri data. Journal of Cognitive Neuroscience. Herd, S.; Mingus, B.; and O’Reilly, R. 2010. Dopamine and self-directed learning. In Proceeding of the 2010 conference on Biologically Inspired Cognitive Architectures 2010: Proceedings of the First Annual Meeting of the BICA Society, 58–63. Amsterdam, The Netherlands, The Netherlands: IOS Press. Jilk, D. J.; Lebiere, C.; O’Reilly, R. C.; and Anderson, J. R. 2008. Sal: An explicitly pluralistic cognitive architecture. Journal of Experimental and Theoretical Artificial Intelligence. Kuipers, B.; Beeson, P.; Modayil, J.; and Provost, J. 2006. Bootstrap learning of foundational representations. Connection Science 18(2):145–158. Lebiere, C., and Staszewski, J. 2010. Expert decision making in landmine detection. In Proceedings of the Human Factors and Ergonomics Society Conference. Modayil, J., and Kuipers, B. 2007. Autonomous development of a grounded object ontology by a learning robot. In Proceedings of the National Conference on Artificial Intelligence, volume 2 of 22, 1095–1101. AAAI Press. 55