A Metacognitive Classifier Using a Hybrid ACT-R/Leabra Architecture

advertisement
Lifelong Learning: Papers from the 2011 AAAI Workshop (WS-11-15)
A Metacognitive Classifier Using a Hybrid ACT-R/Leabra Architecture
Yury Vinokurov and Christian Lebiere
Seth Herd and Randall O’Reilly
Carnegie Mellon University
Department of Psychology
5000 Forbes Ave
Pittsburgh, PA 15212
University of Colorado, Boulder
Department of Psychology
Muenzinger D251C
345 UCB
Boulder, CO 80309
motor control. Each functionality is represented in ACT-R
by a module, and each module exposes a buffer to the central procedural module charged with coordinating the system. Execution of events in ACT-R is dictated by production rules, which fire when their antecedent conditions represented over the set of buffers are met. Rule consequents may
be used to retrieve information from declarative memory,
direct visual attention, or perform motor functions, among
other actions specified through the relevant buffers. If more
than a single rule’s antecedent conditions are met during the
selection phase, the rule with the highest utility, as learned
throughout the experiment, is selected.
Abstract
We present a metacognitive classifier implemented
within a hybrid architecture that combines the strengths
of two existing, mature cognitive architectures: ACT-R
and Leabra. The classification of a set of items into previously seen and novel categories (TRAIN and TEST,
respectively) is carried out in ACT-R using metacognitive signals supplied by Leabra. The resulting system
performance is analyzed as a function of various architectural parameters, and future directions of research are
discussed.
Introduction
Object recognition in dynamic environments is a major challenge for the current generation of classifiers. The major limitation to standard classification techniques is that the classifiers have to be trained on objects for which the ground truth,
in terms of either a pre-assigned label or an error signal, is
known. This limitation prevents the classifiers from dynamically developing their own categories of classification based
on information obtained from the environment. Previous attempts to overcome these limitations have been based on
classical machine learning algorithms (Modayil and Kuipers
2007) (Kuipers et al. 2006). Here we present an alternative
approach to this problem, and develop the beginnings of
a framework within which a classifier can evolve its own
representations based on dynamical information from the
world. This framework combines the perceptual strengths of
traditional neural networks such as Leabra with the symbolic
and sub-symbolic cognition model implemented in ACT-R
to create a hybrid architecture intended to negotiate a complex dynamic environment.
Declarative memory
ACT-R contains a robust declarative memory module, which
stores information as “chunks.” A chunk in ACT-R may contain any number of slots and values for those slots; slot values may be other chunks, numbers, strings, lists, or generally any data type allowed in Lisp (the base language for
ACT-R). Retrieval from declarative memory is handled by a
request to the retrieval module; the request specifies the conditions to be met in order for a chunk to be retrieved from
declarative memory, and the module either returns a chunk
matching those specifications or generates a failure signal if
a retrieval cannot be made. The success of the retrieval procedure depends on the satisfaction of the chunk specification
as well as the current activation of the chunk. In all cases,
retrieving a chunk from declarative memory is controlled by
its activation according to Ai = Bi + Si + Pi + i , where i is
the index of the given chunk, and Bi , Si , Pi , and i represent
the base-level activation, spreading activation, partial match
activation, and noise, respectively. Neither base-level learning nor spreading activation are used in this model; therefore
Bi = Si = 0.
Several retrieval modalities exist in ACT-R: absolute
matching, partial matching, and partial matching with blending. Absolute retrieval is the simplest case: if a chunk whose
specifications exactly match those specifications passed to
the declarative memory module as a request does not exist
(or if its activation is too low to be retrieved), the retrieval
fails. Partial matching and blending are discussed in the following sections.
ACT-R
ACT-R (Anderson and Lebiere 1998) (Anderson et al. 2004)
(Anderson 2007) is a cognitive architecture based on the
rational analysis theory developed by Anderson (Anderson
and Lebiere 1990) and contains both symbolic and subsymbolic components. At the core of the ACT-R architecture is a modular organization which separates functional elements such as declarative memory, visual processing, and
c 2011, Association for the Advancement of Artificial
Copyright Intelligence (www.aaai.org). All rights reserved.
50
Partial Matching
a consensus about the provided information. In the case of
the ACT-R/Leabra hybrid, we use the combination of partial
matching and blending to retrieve a judgment about whether
a given item belongs to the TEST or TRAIN set; since the
judgment is a continuous number from 0 to 1 (0 corresponding to TRAIN and 1 to TEST), such a retrieval provides
not only the judgment itself but the confidence associated
with the judgment. This approach has been used in a wide
range range of applications, including game playing (Sanner et al. 2000), scheduling (Gonzalez, Lerch, and Lebiere
2003), decision-making (Gonzalez and Lebiere 2005) and
landmine detection (Lebiere and Staszewski 2010).
In the case of many modeling tasks, especially those involving continuous variables, the exact chunk specified in the
module request is unlikely to exist in declarative memory.
In such situations, what is wanted is the “closest” chunk
available. To this end, ACT-R implements a partial matching retrieval mechanism. This mechanism provides a hook
into a similarity metric for comparing the value present in a
chunk slot with the value requested by the pattern matcher.
This metric is scaled between 0 and -1, with 0 corresponding
to a perfect match and -1 corresponding to the highest mismatch possible. This partial matching metric gives the partial matching activation as described in
the preceding section according to the formula Pi =
k P Mik , where P
is the mismatch penalty which weights the similarity and
Mik is the similarity metric between the value k in the
retrieval specification and the value in the corresponding
slot of chunk i. In the hybrid ACT-R/Leabra model, partial matching is employed on the parameters provided by
Leabra to retrieve instances of previously known and unknown items.
Leabra
Leabra is a set of algorithms for simulating neural networks
(O’Reilly and Munakata 2000). These algorithms serve as
a computational theory of brain (primarily cortical) function, as the algorithms and their parameterization have been
closely guided by a wide variety of data. Each model that
uses the Leabra framework is independent, but the sharing of common principles and parameters across many networks make the Leabra framework an integrative theory of
brain (and cognitive) function, just as ACT-R is an integrative theory of cognitive (and brain) function. Leabra operates at an intermediate level of detail (e.g., firing-rate coded,
point neurons) to allow modeling effects both down to the
molecular effects (e.g., increased sodium conductances from
pharmacological manipulation) and up to the cognitive level
(e.g., differentially worse performance in the incongruent
condition of the Stroop task).
Leabra simulations have explored numerous neuroscientific processes including cognitive control (Herd and
O’Reilly 2006) (O’Reilly and Pauli 2010), visual search
(Herd and O’Reilly 2005), self-directed learning (Herd,
Mingus, and O’Reilly 2010), working memory and executive function in the prefrontal cortex (Hazy, Frank, and
O’Reilly 2006), language processing and selection (Snyder
et al. 2010), and cortical learning (O’Reilly and Munakata
2002).
Here we used the LVIS (Leabra Vision) model (O’Reilly
et al. 2011). This model uses the Leabra algorithm and the
known convergent hierarchical structure of the human visual system. A V1 layer is activated according to a set of
filters for line/edge orientation, length sums, and end stopping. This layer is sparsely connected in a convergent manner to a V4 layer, which in turn is similarly connected to an
IT layer. Connections from V1 to V4, V4 to IT, and IT to
output are allowed to learn using Leabra’s combination of
self-organizing and biologically-realistic error driven learning. The model learns to categorize 100 types of objects,
with a maximum generalization performance of about 93%
in categorizing entirely new object models into learned categories. For the present experiments, it was trained on 50 of
those object classes (the TRAIN items in the hybrid model),
with the remaining 50 reserved as novel (TEST) items.
Blending
Sometimes even the closest matched chunk will not do;
what we would like in such a situation is a “consensus”
chunk whose values are the weighted averages of all chunks
which (partially) match the request specification. This
is implemented in ACT-R via the blending mechanism.
In a blended retrieval, the activation of every matching
chunk is calculated; then, each slot is assigned a value
which is computed from the values of the corresponding
slots of the matching chunks by an average weighted by
the activation. For continuous numbers, this produces the
standard weighted average of a series of values. For discrete
values such as strings or other chunks, the retrieved chunk
will contain the “winning” value; that is, the value that
receives the most “votes” out of the matching chunks, as
weighted by the activation. Formally, the value returned by
the blending retrieval is given by Eq. 1:
V = min
Pi (1 − Sim(V, Vi ))
2
(1)
i
where
eMi /t
Pi = M /t
j
je
(2)
is the probability of retrieving the ith chunk as a function
of match score Mi , Sim(V, Vi ) is the similarity between the
retrieved value V and the actual value Vi returned by the ith
chunk, and t is the “temperature” of the Boltzmann distribution in Eq. 2 which corresponds to the noise.
The combination of partial matching and blending is an
extremely powerful mechanism in ACT-R, as partial matching allows for the factoring of semantic similarity on the
conditions expressed on the retrieval while blending reflects
those similarities in the values resulting from the retrieval.
For example, it may be used to generate hypotheses based on
51
The ACT-R/Leabra Hybrid
As an architecture that combines symbolic and subsymbolic representation, ACT-R is very well suited to modeling aspects of high-level cognition such as decisionmaking, control, and memory storage and recall (for
a full list of applications of ACT-R, see http://actr.psy.cmu.edu/publications/index.php). On the other hand,
Leabra includes a detailed account of bottom-up perception, which ACT-R currently does not. This suggests a natural synergy between the two architectures, which combines
Leabra’s perception capabilities with ACT-R’s control and
memory functionalities. This approach had already been implemented to some extent in the SAL architecture described
in (Jilk et al. 2008), in which an ACT-R/Leabra hybrid model
was embedded in an Unreal Tournament world.
The current hybrid architecture improves on SAL by providing a tight and natural integration between the two systems. This is done by wrapping the interaction with Leabra
inside of an ACT-R module, which exposes a buffer called
leabra-visual to the ACT-R system. This buffer provides a
way of issuing requests to the Leabra visual module and retrieving data from it. Communication with Leabra is handled via sockets, with Leabra acting as the server and ACTR as the client. If the request is one that returns data from
Leabra, that data is converted into a chunk which can then
be accessed through the leabra-visual buffer. In this way,
the Leabra vision module is exactly analogous to the standard ACT-R visual module. In the current implementation
of the hybrid architecture, the interaction between ACT-R
and Leabra is limited to commands from ACT-R that direct
Leabra’s attention to various objects, and object identification data that flows from Leabra to ACT-R and is encoded as
chunks. However, the modular nature of both architectures
is such that once the basic framework has been set up, the
functionality may be extended indefinitely.
(a) Visual chunk
(b) Context chunk
Figure 1: A schematic representation of the visual chunk
obtained from Leabra and the context chunk generated by
the blending retrieval. The “data” slot in 1(b) actually represents several slots; in this case, the data is identical to the
out max act and it avg act parameters.
its attention towards a different view of the item at which
Leabra is currently “looking.” Whether or not the ACT-R
model chooses to do so depends on whether its certainty
in its current judgment is sufficient to make a decision; the
threshold for decision-making is simply a parameter. If the
model surpasses the certainty threshold, it will make a judgment about the item and direct visual attention to a new item.
If the threshold is not surpassed, another view of the item
is requested, and this continues until some predetermined
number of views has been processed (in our case, 20), at
which point the results of the last view are declared to be the
model’s judgment irrespective of threshold.
The data to the ACT-R model is supplied via chunks
in the leabra-visual buffer and its output is another chunk
whose values are generated by the blending mechanism. The
chunks themselves are simply Lisp lists; a slot-value pairing is a value in a particular place in the list. The schematic
structure of the chunks is shown in Fig. 1. The visual chunk
simply contains the metacognitive signal from Leabra, as
well as some additional information like the object name,
which is not used by the classifier. The context chunk is built
up in ACT-R’s imaginal buffer; here, the “data” slot represents the metacognitive signal from Leabra (which may include several slot-value pairs) and the “expectation” slot is
the previous judgment of the classifier. A retrieved context
chunk will also contain an “outcome” value, which represents the classifier’s current judgment about the item category.
The Serial Classifier
The serial classifier is an instance-based model (Taatgen,
Lebiere, and Anderson 2006) of object classification into
two categories. The task of the serial classifier implemented
using the ACT-R/Leabra hybrid architecture is to sort input items into ones that the Leabra network has either been
trained on or not (referred to as TRAIN and TEST categories). For our purposes, the two categories are coded as 0
and 1, respectively. A category judgment as made by ACTR may lie anywhere on that continuum. Furthermore, the
value of the judgment provides a natural estimate of its certainty, which for a judgment with some value j ∈ [0, 1] is
defined as = abs(j − 0.5). Thus, the certainty has values
∈ [0, 0.5]. The ACT-R model requests a visual chunk from
Leabra, which contains not only the object identity but also
metacognitive information pertaining to the internals of the
Leabra network, as well as the ground truth about an item’s
category.
The serial nature of the classifier lies in its ability to request additional information about a given item and revise
its judgment accordingly. The ACT-R model may either direct Leabra’s attention towards a new item, or it may direct
Metacognitive Parameters
The classifier makes its decisions based on partially matching against the parameters provided by Leabra. Through a
feature selection analysis, it was determined that the average activation of Leabra’s IT layer and the maximum activation of the output layer are the two most significant tracers of novelty. Although these parameters are referred to as
the “metacognitive” parameters, it should be clear that they
are only used in a metacognitive context by ACT-R; within
the Leabra model itself, these parameters simply indicate
a goodness-of-fit to the learned weights. Matching on the
other parameters supplied by Leabra did not produce any
difference in the results.
52
Figure 2: Plots of threshold vs. performance for every element in the space defined by the match penalty (mp) and noise (ans).
Training Data
judgment may surpass it, and the model will go on to view
the next item.
If the threshold is not surpassed initially, the model will
request another view. The same metacognitive parameters as
before are used in the partial matching specification; however, the specification will now include the previous expectation. Thus, if the training set contains many instances of
situations in which a given expectation led to a given final
judgment, the blending mechanism will bias the next iteration of the expectation towards that result. For example,
if the item is a TEST (meaning that the ground truth is 1)
and the current expectation is 0.6, partially matching with
blending on that expectation value might produce a revised
expectation of 0.7 on the next iteration. Then, the value of
0.7 would be used in the next partial matching retrieval, if
one is necessary. This process continues until the threshold
parameter is surpassed or 20 views of the item have been
given. A representative sequence of judgments can be seen
in Fig. 3.
The training for the classifier consists of training the ACTR side of the hybrid by learning chunks that represent TEST
and TRAIN items. This is separate from the process of training the LVIS model to recognize images. The training data
for the classifier consists of 200 items, selected from the
TRAIN or TEST set with equal probability of 12 ; thus, the final training set consists of roughly a 50/50 mixture of TEST
and TRAIN items. Each item is viewed 20 times. For each
item, the recorded data consists of the metacognitive parameters, the expectation, and the ground truth. The expectation
is initially generated randomly and adjusted on each view
based on a simple thresholding of the maximum activation
of the output layer. If the activation is ≥ 0.5 the expectation
is revised upward (closer to 1) by 0.05, while if it is ≤ 0.5
it is revised downward (closer to 0) by 0.05. For most items,
the expectation quickly rails to one of the two thresholds,
though for some items, 20 views are not enough to reach
either limit.
Judgment Revision
Classifier Performance
The revision of judgments is a key feature of the serial classifier and is implemented using partial matching and blending. Recall that this combination effectively produces a consensus chunk whose slots are filled with values obtained via
weighted average from the corresponding slots of the matching chunks in declarative memory. The first time the classifier views a particular item, it will attempt to retrieve such a
consensus chunk based purely on the metacognitive parameters supplied by Leabra. This in turn generates a judgment
between 0 and 1 based on the instances in the model’s training set. If the threshold for certainty is sufficiently low, the
Parameter Space Exploration
On the ACT-R side, multiple model parameters affect the retrieval process. The chief of these are the mismatch penalty
and the noise. The mismatch penalty is effectively the
penalty incurred by chunks depending on how well they partially match a given chunk specification. The noise is sampled from a Gaussian distribution and added to the activation of a given chunk. In this way, the mismatch penalty
and the noise counterbalance each other: the higher the mis-
53
Figure 4: Plot of threshold vs. performance for mp = 10 and
ans = 0.1. Each data point represents an average over 10
runs. A threshold of 0.25 achieves the best performance by
correctly classifying objects 76.8% of the time.
Figure 3: A sequence of judgments undertaken by the categorizer. The certainty cutoff threshold is 0.4; the model acquires different views of the object and successively refines
its judgments until it converges on the correct answer with
the requisite certainty.
match penalty, the smaller the neighborhood in parameter
space into which chunks must fall in order to be considered
a match. On the other hand, the higher the noise, the more
“smeared” the boundaries of that neighborhood become, as
chunks which are outside the matching neighborhood may
fall into it once the noise is added, and conversely, chunks
within the neighborhood may fall out of it.
A parameter exploration was carried out for 5 different
values of the mismatch penalty (mp) and 5 different values of noise (ans). Additionally, for each combination in
the (mp, ans) space, a series of decision thresholds were
used. Fig. 2 shows the performance of the classifier in
terms of percent correct as a function of threshold for each
point in this space. In the upper right corner of the space
(high noise, low mismatch penalty), performance degrades,
as many chunks which would not otherwise be candidates
for matching fall into an already too permissive neighborhood in the match space. However, once the ratio of noise
to mismatch penalty reaches approximately 10, the system
becomes relatively insensitive to changes in both the noise
and the mismatch penalty. This suggests that optimal performance should occur in a configuration with high match
penalties and low noise values. Further simulations were
conducted by fixing the mismatch penalty at 10 and the noise
parameter at 0.1.
Conclusions and Further Work
We demonstrate an implementation of a serial classifier realized through the integration of two rather different cognitive architectures: the primarily symbolic and sub-symbolic
ACT-R, and the connectionist Leabra. The metacognitive parameters generated by Leabra are used by ACT-R to classify
objects into one of two distinct categories on the basis of
past instances on which it has been trained. The metacognitive signals needed to perform this classification to ∼ 77%
accuracy (at a threshold value 0.25) are simply the average
activation of the IT layer and the maximum activation of the
output layer.
Further work remains to be done with this model. One interesting question concerns the kinds of objects that are not
classified correctly by this model; an analysis on common
features shared by such objects has yet to be performed. Another area of research would be the identification of better
metacognitive signals within Leabra. Also, at this time, the
model is not constrained with regard to how many views it
may request, other than the hard cut-off of 20, implemented
to make sure the model terminates. In other words, there is
no extra cost built into the model associated with the acquisition of new information. Real situations, however, do have
such a cost in terms of time. Thus, an additional timing/cost
constraint could be placed on the model to examine its performance under time pressures.
The present model is passive; it does not interact with the
world in any way. A further avenue of research would be to
close the loop between the world and the model by allowing
the model to manipulate the world. In this way, the model
could learn not only representations of the world, but also
which actions allow it to achieve its goals within that world.
This would represent a crucial step towards the realization
of self-directed learning, as described in (Herd, Mingus, and
Average Performance as a Function of
Threshold
With the mismatch penalty fixed at 10 and the noise parameter fixed at 0.1, 10 simulations were run with the same
training set for threshold values of 0.1, 0.2, 0.225, 0.25,
0.275, 0.3, and 0.4. The threshold space is sampled more
heavily around 0.25 because, as demonstrated in Fig. 2,
the optimal threshold value lies within that range for most
cases. The results of the simulation are displayed in Fig. 4.
54
O’Reilly 2010).
O’Reilly, R. C., and Munakata, Y. 2000. Computational
Explorations in Cognitive Neuroscience: Understanding the
Mind by Simulating the Brain. MIT Press.
O’Reilly, R. C., and Munakata, Y. 2002. Psychological
function in computational models of neural networks. In
Michela Gallagher, Randy J. Nelson, I. B. W., ed., Handbook of Psychology, volume 3. Wiley. chapter 22.
O’Reilly, R.C.; Herd, S., and Pauli, W. 2010. Computational
models of cognitive control. Current opinion in neurobiology.
O’Reilly, R. C.; Herd, S. A.; Wyatt, D.; Mingus, B.; and Jilk,
D. 2011. Bidirectional biological object recognition. submitted.
Sanner, S.; Anderson, J. R.; Lebiere, C.; and Lovett, M.
2000. Achieving efficient and cognitively plausible learning
in backgammon. In Proceedings of the Seventeenth International Conference on Machine Learning.
Snyder, H. R.; Hutchison, N.; Nyhus, E.; Curran, T.; Banich,
M. T.; O’Reilly, R. C.; and Munakata, Y. 2010. Neural inhibition enables selection during language processing. Proceedings of the National Academy of Sciences
107(38):16483–16488.
Taatgen, N.; Lebiere, C.; and Anderson, J. R. 2006. Modeling paradigms in act-r. In Sun, R., ed., Cognition and MultiAgent Interaction: From Cognitive Modeling to Social Simulation. Cambridge University Press.
Acknowledgments
This work was conducted through collaborative participation in the Robotics Consortium sponsored by the U.S Army
Research Laboratory under the Collaborative Technology
Alliance Program, Cooperative Agreement W911NF-10-20016.
References
Anderson, J. R., and Lebiere, C. 1990. The Adaptive Character of Thought. Hillsdale, New Jersey: Erlbaum.
Anderson, J. R., and Lebiere, C. 1998. The Atomic Components of Thought. Mahwah, New Jersey: Erlbaum.
Anderson, J. R.; Bothell, D.; Byrne, M. D.; Douglass, S.;
Lebiere, C.; and Qin, Y. 2004. An integrated theory of mind.
Psychological Review 111(4):1036–1060.
Anderson, J. R. 2007. How Can the Human Mind Occur in
the Physical Universe? Oxford University Press.
Gonzalez, C., and Lebiere, C. 2005. Instance-based cognitive models of decision making. In Zizzo, D., and Courakis,
A., eds., Transfer of knowledge in economic decision making. New York: Palgrave McMillan.
Gonzalez, C.; Lerch, F. J.; and Lebiere, C. 2003. Instancebased learning in dynamic decision making. Cognitive Science 27(4):591–635.
Hazy, T.; Frank, M.; and O’Reilly, R. 2006. Banishing the
homunculus: Making working memory work. Neuroscience
139(1):105 – 118.
Herd, S., and O’Reilly, R. 2005. Serial visual search from a
parallel model. Vision Research.
Herd, S.A.; Banich, M., and O’Reilly, R. 2006. Neural
mechanisms of cognitive control: An integrative model of
stroop task performance and fmri data. Journal of Cognitive
Neuroscience.
Herd, S.; Mingus, B.; and O’Reilly, R. 2010. Dopamine and
self-directed learning. In Proceeding of the 2010 conference on Biologically Inspired Cognitive Architectures 2010:
Proceedings of the First Annual Meeting of the BICA Society, 58–63. Amsterdam, The Netherlands, The Netherlands:
IOS Press.
Jilk, D. J.; Lebiere, C.; O’Reilly, R. C.; and Anderson, J. R.
2008. Sal: An explicitly pluralistic cognitive architecture.
Journal of Experimental and Theoretical Artificial Intelligence.
Kuipers, B.; Beeson, P.; Modayil, J.; and Provost, J. 2006.
Bootstrap learning of foundational representations. Connection Science 18(2):145–158.
Lebiere, C., and Staszewski, J. 2010. Expert decision making in landmine detection. In Proceedings of the Human
Factors and Ergonomics Society Conference.
Modayil, J., and Kuipers, B. 2007. Autonomous development of a grounded object ontology by a learning robot. In
Proceedings of the National Conference on Artificial Intelligence, volume 2 of 22, 1095–1101. AAAI Press.
55
Download