Evolution of Reinforcement Learning in Unpredictable Environments

advertisement
Time, Rate and Conditioning
or
“A model with no free parameters
that explains everything in behaviour”
C.R. Gallistel
John Gibbon
Psych. Review 2000, 107(2):289-344
Introduction
• An animal’s response in Pavlovian conditioning
is influenced by the timing of the expected
reinforcer.
• Take the interval-learning assumption as a
starting point for the analysis of
conditioning.
• Argue that learning of temporal intervals and
their reciprocals (event rates) are core
processes in conditioning.
• Definition of timescale invariance: normalized
data plots superimpose.
Timescale invariance of CR latency
CR is maximally likely at the reinforcement latency
Acquisition
• Model within a decision–theoretic framework (Like
psychometric decision models: stimulus -> (noisy) decision
variable -> threshold -> response)
• Most data from pigeon autoshaping (unfortunately no individual
acquisition data) – Pavlovian, delay and not trace conditioning.
Acquisition - Facts
• No effect of partial
reinforcement on rate of
acquisition.
• Increasing avg. ITI increases
rate of acquisition (less
reinforcements needed)
• Delay of reinforcement
reduces acquisition rate.
 Reinforcements to
acquisition are inversely
proportional to ISI/ITI
(=T/I) ratio, independent of
reinforcement schedule.
 Timescale invariance
Acquisition – Some more facts
• Irrelevance of reinforcer
magnitude for rate of acquisition
(counterintuitive result ->)
(But – important in preference and in
determining asymptotic level of
performance)
• Conditioning is driven by CS-US
contingency, not by temporal
pairing of CS and US
RET – Rate Estimation Theory
(The “Whether” decision)
•
•
Estimated upper limit of background
reinforcement rate compared to CS
reinforcement rate.
Timescale invariant (contrast to trace
decay models (S+B), or trial-based
models (R+W)).
RET – Some more things…
•
•
•
Partitioning – necessary in RET because at first can’t know if
to credit US to CS or to background.
Consistency check – “Absence of evidence is not evidence of
absence”.
No effect of reinforcement magnitude – same magnitude
guessed for background (cancel out)
Extinction - Facts
• A consequence of non-reinforcements – problematic “events”
• No problem for RET…
•
•
•
•
Weakening of CR with extended
experience of nonreinforcement (individual
psychometric function not
known).
Partial reinforcement extinction
effect – maintains same number
of reinforcements to be
omitted.
NO EFFECT of ISI/ITI ratio on
extinction rate.
Rate of extinction similar to
that of response elimination
(random control).
Extinction - Model
• Based on different comparison – that between cumulative
amounts of unreinforced CS since last reinforcer, and expected
amount of CS exposure per reinforcement:
ICS no R/ IRICS = λCS / λCS no R = β
 It is the relatively prolonged interval without reinforcement
that leads to extinction (not occurrence of non-events)
 The decision process of extinction operates all the time –
general detection of changes in rates of significant events.
(Follows directly – partial reinforcement effect, no effect of ISI/ITI ratio,
rate of elimination – partitioning scheme)
 Fundamental aspect of timing theory: different decisions rest
on different comparisons
Cue Competition
• Conditioning (responding) to one CS is not independent of that
to other CSs in the protocol (led to R+W model)
Examples:
•
•
•
•
•
Blocking
Overshadowing ->
Relative validity ->
Retroactive reversal of blocking
and overshadowing
Inhibitory conditioning (summation
test).
In RET all these result from
two principles of the
partitioning process:
1. Rate additivity
2. Parsimony (prediction
minimization)
Cue Competition - Partitioning
Explains: 1. Blocking
+ generalization to magnitude changes and their effects
And… Explains some more:
•
Overshadowing: Here the principle of parsimony is necessary – the
resolution in favour of one CS and not the other is nevertheless
arbitrary (Partial responding to second CS is assumed to result from
second order conditioning predicting the first CS).
•
Retroactive reversals: explained by retroactive reallocation (huh?)
of rates (contrast with associative models in which an association
with an unpresented CS cannot change)
•
Inhibitory conditioning (3 different protocols): explained in a
straightforward way with no further assumptions, by allowing
negative rate estimations and defining their effect on behaviour.
Question and Answer session:
Associative vs. Timing theories
(the good guys and the bad guys)
3) What is the effect of reinforcement?
•
•
Standard answer: It strengthens excitatory associations.
Timing answer: It marks the beginning and/or the termination of one or
more intervals-- an inter-reinforcement interval, a CS-US interval, or
both.
6) What happens when nothing happens (during the intertrial
interval)?
•
•
Standard answer: Nothing.
Timing answer: The timer for the background continues to accumulate.
15) What is the fundamental experimential variable in operant
conditioning?
•
•
Standard answer: Probability of reinforcement
Timing answer: Rate of reinforcement
Contrasting Basic Assumptions:
Associative vs. Timing models





“The two conceptual frameworks have no fundamental elements in
common”. Timing models have more in common with psychophysical
models of vision and hearing than with associative models of learning.
Appeal of associative theories: basic event is change in connection
strength (neurobiologically transparent, but… not invertible many to
one function of experimental variables)
Conversely – elementary event in timing models is measurement,
recording and retrieval of elapsed intervals (explains a lot, but
requires computer-like memory system).
Timing theories encompass explicit decision mechanisms whereby
associative theories don’t.
“We believe it is time to rethink conditioning in light of these
findings. We have attempted to show that timing theory
provides a powerful and wide-ranging alternative to the
traditional conceptual framework”.
Recommended reading
• Time, Rate and Conditioning – Gallistel C.R.
and Gibbon J., Psych. Rev. 2000, 107(2):289344
• Acquisition and Extinction in Autoshaping Sham Kakade and Peter Dayan, Psych. Rev.
2002, 109(3):533-544
Download