Differentiating Models of Associative Learning

advertisement
Differentiating Models of Associative Learning: Reorientation, Superconditioning,
and the Role of Inhibition
Brian Dupuis (bdupuis@ualberta.ca)
Michael R.W. Dawson (mdawson@ualberta.ca)
Department of Psychology, University of Alberta
Edmonton, Alberta, Canada
Keywords: Rescorla–Wagner model; artificial neural
networks; operant choice; reorientation; superconditioning.
locations with the correct geometric configuration in
absence of the correct feature.
Although developed with reorientation in mind, the
Miller-Shettleworth model shows signs of a serious
mathematical mistake when tested on this task. At high
learning rates, the model predicts dramatic fluctuations in
associative strength (Figure 1) and in choice probabilities,
eventually culminating in a global divide-by-zero error.
The Miller-Shettleworth Model
One broad class of models of associative learning, based
on Rescorla and Wagner’s (1972) original model, views
stimuli as collections of cues that compete with each other
for associative strength. Miller and Shettleworth (2007,
2008) attempted to apply a variation of this model to the
field of spatial learning. Their model attempts to capture the
role of choice behavior in spatial navigation tasks –
specifically, how agents estimate the likelihood of
reinforcement based on their experience, and how their
choices are influenced by these estimates. They observe,
correctly, that this makes such tasks exercises in operant
conditioning, while the well-established Rescorla-Wagner
model captures classical conditioning.
In their model, Miller and Shettleworth multiply the
Rescorla-Wagner equation by a “probability” term – a ratio
of the associative strengths a single choice’s cues to the total
associative strength of all cues at all possible choices.
However, this produces a model that is empirically and
mathematically flawed (Dupuis & Dawson, in press). Here,
we describe these empirical and formal flaws, and supply an
alternative associative model for investigating these tasks.
Figure 1: The Miller-Shettleworth model’s associative
strength on a reorientation task at a high (0.7) learning rate.
Superconditioning
Superconditioning arises when an excitatory cue paired
with an inhibitor produces greater excitation during further
training than it produces when paired with a neutral cue.
This is a prediction of the Rescorla-Wagner (1972) model
and is well-established in animal experimental literature.
A recent experiment (Horne & Pearce, 2010) established
that this effect also occurs in spatial learning – that is, rats
trained with an inhibitory feature responded to geometryonly probe trials with greater probability than rats trained
with a neutral feature. However, when the MillerShettleworth (2008) model attempts to model this effect, it
predicts the opposite result (experimental 0.91, control
0.94). Horne and Pearce observed that the model was not
assigning sufficient inhibitory associative strength to cues
that are present at both reinforced and non-reinforced
locations.
Empirical Shortcomings
The Miller-Shettleworth model’s flaws are evident when
one investigates how it behaves in a pair of spatial tasks.
Reorientation
A ‘reorientation task’ is a common experimental paradigm,
used to explore spatial and geometric learning, where agents
are placed inside a controlled arena – typically rectangular,
with an assorted set of feature information or landmarks
such as colored panels over the corners. The subjects must
learn which locations are or are not reinforced. Systematic
change to this arena after training produces regular effects,
most famously “rotational error”. If the colored panel is
moved from the reinforced corner to a different corner,
agents will follow it, but they will also return to its original
corner… as well as the corner rotationally opposite (which
has walls in the same configuration as the original corner).
In an associative context, each location is defined as the
collection of cues present at that location – the geometric
configuration of the walls, the type of features present, and
so on. Rotational error is explained as a response to
Why Does It Fail?
This weakness in handling inhibition is indicative of why
the Miller-Shettleworth model produces such unusual
results. The root cause lies with their decision to implement
operant choice by scaling the Rescorla-Wagner equation.
The Rescorla-Wagner model includes an implicit measure
of the time that passes with each iteration of the equation
within its learning rate parameter – a term that is held
constant (and thus suppressed), because to do otherwise
322
would “beg justification” (Rescorla & Wagner, 1972).
When one calculates the change in associative strength as
this change in time approaches zero, the Rescorla-Wagner
equation produces the instantaneous time derivative of
associative strength.
Miller and Shettleworth (2007, 2008) multiply the
Rescorla-Wagner equation by some “probability” ratio. The
problem is that both of these equations are functions of
current associative strength – meaning both equations
contain a (suppressed) time term. In order to correctly
control for this additional time dependency, the chain rule
must be applied to the equation or the form of the derivative
will change. Miller and Shettleworth did not do this.
In effect, this uncontrolled time dependency is equivalent
to allowing the learning rate to vary independently for each
type of cue, with none of the justification that Rescorla and
Wagner “beg”.
is, it is not normalized), which has important theoretical
implications.
Empirical Robustness
When tested on the same reorientation task illustrated
above, the operant perceptron converges upon the expected
solution at low (0.15), high (0.7), and extremely high (1.0)
learning rates, illustrating that it does not succumb to the
scaling problems seen in the Miller-Shettleworth model. (A
discussion of perceptrons and reorientation is found in
Dawson et al. (2010).)
Presenting Horne and Pearce’s (2010) superconditioning
experiment to the operant perceptron leads to a prediction
consistent with their animal experiments (experimental
0.998, control 0.970). Examining the operant perceptron’s
behavior over time shows that it assigns substantially more
inhibitory associative strength to partially-reinforced cues
than the Miller-Shettleworth model – a result of applying
the full learning rule at some frequency rather than a scaled
rule.
Solution: The Operant Perceptron
In light of these empirical and mathematical results, we
must recommend that the Miller-Shettleworth model be
abandoned. However, this need not spell the end for
associative theory in exploring these areas. An alternative
model – a simple artificial neural network known as a
perceptron – has been shown to successfully model many of
the standard reorientation task results (Dawson, Kelly,
Spetch, & Dupuis, 2010), even though it employs a
classical-conditioning training algorithm. This perceptron is
presented a pattern of cues representing a location in a
reorientation arena, which is then sent through weighted
connections to an output unit. This output unit sums the
weighted signals together and responds with the logistic
function of this value; mistakes in this response are then
used to modify the connection weights using a gradientdescent learning rule based on the Rescorla-Wagner
equation.
A simple modification to this algorithm to capture the
probabilistic choice behavior in operant conditioning
produces an “operant perceptron”. After a location’s cues
are presented, the operant perceptron generates its logistic
output – a number that must fall between 0 and 1. This has
been shown to literally be the estimate of the conditional
probability of reinforcement given the pattern of cues
(Dawson & Dupuis, 2012). Therefore, this output response
is used as the probability of the operant perceptron
investigating a location. If it does not investigate a location,
its weights are not changed – as in operant conditioning, the
operant perceptron only learns from its experience.
A critical difference between the operant perceptron and
the Miller-Shettleworth model is that the latter applies a
scaled Rescorla-Wagner equation with every iteration, while
the former applies a normal, unscaled Rescorla-Wagnerstyle equation on some subset of iterations. This allows the
operant perceptron to bypass the calculus error described
above. Furthermore, the operant perceptron generates
conditional probabilities for each option independently (that
References
Dawson, M. R. W., & Dupuis, B. (2012). The equilibria of
perceptrons for simple contingency problems. IEEE
Transactions in Neural Networks and Learning
Systems, 23(8), 1340–1344.
doi:10.1109/TNNLS.2012.2199766
Dawson, M. R. W., Kelly, D. M., Spetch, M. L., & Dupuis,
B. (2010). Using perceptrons to explore the
reorientation task. Cognition, 114(2), 207–26.
doi:10.1016/j.cognition.2009.09.006
Dupuis, B., & Dawson, M. R. W. (in press). Differentiating
Models of Associative Learning: Reorientation,
Superconditioning, and the Role of Inhibition. Journal
of Experimental Psychology: Animal Behavior
Processes (36 pages, accepted February 4, 2013).
Horne, M. R., & Pearce, J. M. (2010). Conditioned
inhibition and superconditioning in an environment
with a distinctive shape. Journal of Experimental
Psychology: Animal Behavior Processes, 36(3), 381–
94. doi:10.1037/a0017837
Miller, N. Y., & Shettleworth, S. J. (2007). Learning about
environmental geometry: An associative model.
Journal of Experimental Psychology: Animal
Behavior Processes, 33(3), 191–212.
doi:10.1037/0097-7403.33.3.191
Miller, N. Y., & Shettleworth, S. J. (2008). An associative
model of geometry learning: a modified choice rule.
Journal of Experimental Psychology: Animal
Behavior Processes, 34(3), 419–22.
doi:10.1037/0097-7403.34.3.419
Rescorla, R. A., & Wagner, A. R. (1972). A theory of
Pavlovian conditioning: Variations in the
effectiveness of reinforcement and nonreinforcement.
In A. H. Black & W. F. Prokasy (Eds.), Classical
conditioning II: current research and theory (pp. 64–
99). New York, NY: Appleton-Century-Crofts.
323
Download