Evolution of Reinforcement Learning in Unpredictable Environments

Time, Rate and Conditioning or “A model with no free parameters that explains everything in behaviour” C.R. Gallistel John Gibbon Psych. Review 2000, 107(2):289-344 Introduction • An animal’s response in Pavlovian conditioning is influenced by the timing of the expected reinforcer. • Take the interval-learning assumption as a starting point for the analysis of conditioning. • Argue that learning of temporal intervals and their reciprocals (event rates) are core processes in conditioning. • Definition of timescale invariance: normalized data plots superimpose. Timescale invariance of CR latency CR is maximally likely at the reinforcement latency Acquisition • Model within a decision–theoretic framework (Like psychometric decision models: stimulus -> (noisy) decision variable -> threshold -> response) • Most data from pigeon autoshaping (unfortunately no individual acquisition data) – Pavlovian, delay and not trace conditioning. Acquisition - Facts • No effect of partial reinforcement on rate of acquisition. • Increasing avg. ITI increases rate of acquisition (less reinforcements needed) • Delay of reinforcement reduces acquisition rate.  Reinforcements to acquisition are inversely proportional to ISI/ITI (=T/I) ratio, independent of reinforcement schedule.  Timescale invariance Acquisition – Some more facts • Irrelevance of reinforcer magnitude for rate of acquisition (counterintuitive result ->) (But – important in preference and in determining asymptotic level of performance) • Conditioning is driven by CS-US contingency, not by temporal pairing of CS and US RET – Rate Estimation Theory (The “Whether” decision) • • Estimated upper limit of background reinforcement rate compared to CS reinforcement rate. Timescale invariant (contrast to trace decay models (S+B), or trial-based models (R+W)). RET – Some more things… • • • Partitioning – necessary in RET because at first can’t know if to credit US to CS or to background. Consistency check – “Absence of evidence is not evidence of absence”. No effect of reinforcement magnitude – same magnitude guessed for background (cancel out) Extinction - Facts • A consequence of non-reinforcements – problematic “events” • No problem for RET… • • • • Weakening of CR with extended experience of nonreinforcement (individual psychometric function not known). Partial reinforcement extinction effect – maintains same number of reinforcements to be omitted. NO EFFECT of ISI/ITI ratio on extinction rate. Rate of extinction similar to that of response elimination (random control). Extinction - Model • Based on different comparison – that between cumulative amounts of unreinforced CS since last reinforcer, and expected amount of CS exposure per reinforcement: ICS no R/ IRICS = λCS / λCS no R = β  It is the relatively prolonged interval without reinforcement that leads to extinction (not occurrence of non-events)  The decision process of extinction operates all the time – general detection of changes in rates of significant events. (Follows directly – partial reinforcement effect, no effect of ISI/ITI ratio, rate of elimination – partitioning scheme)  Fundamental aspect of timing theory: different decisions rest on different comparisons Cue Competition • Conditioning (responding) to one CS is not independent of that to other CSs in the protocol (led to R+W model) Examples: • • • • • Blocking Overshadowing -> Relative validity -> Retroactive reversal of blocking and overshadowing Inhibitory conditioning (summation test). In RET all these result from two principles of the partitioning process: 1. Rate additivity 2. Parsimony (prediction minimization) Cue Competition - Partitioning Explains: 1. Blocking + generalization to magnitude changes and their effects And… Explains some more: • Overshadowing: Here the principle of parsimony is necessary – the resolution in favour of one CS and not the other is nevertheless arbitrary (Partial responding to second CS is assumed to result from second order conditioning predicting the first CS). • Retroactive reversals: explained by retroactive reallocation (huh?) of rates (contrast with associative models in which an association with an unpresented CS cannot change) • Inhibitory conditioning (3 different protocols): explained in a straightforward way with no further assumptions, by allowing negative rate estimations and defining their effect on behaviour. Question and Answer session: Associative vs. Timing theories (the good guys and the bad guys) 3) What is the effect of reinforcement? • • Standard answer: It strengthens excitatory associations. Timing answer: It marks the beginning and/or the termination of one or more intervals-- an inter-reinforcement interval, a CS-US interval, or both. 6) What happens when nothing happens (during the intertrial interval)? • • Standard answer: Nothing. Timing answer: The timer for the background continues to accumulate. 15) What is the fundamental experimential variable in operant conditioning? • • Standard answer: Probability of reinforcement Timing answer: Rate of reinforcement Contrasting Basic Assumptions: Associative vs. Timing models      “The two conceptual frameworks have no fundamental elements in common”. Timing models have more in common with psychophysical models of vision and hearing than with associative models of learning. Appeal of associative theories: basic event is change in connection strength (neurobiologically transparent, but… not invertible many to one function of experimental variables) Conversely – elementary event in timing models is measurement, recording and retrieval of elapsed intervals (explains a lot, but requires computer-like memory system). Timing theories encompass explicit decision mechanisms whereby associative theories don’t. “We believe it is time to rethink conditioning in light of these findings. We have attempted to show that timing theory provides a powerful and wide-ranging alternative to the traditional conceptual framework”. Recommended reading • Time, Rate and Conditioning – Gallistel C.R. and Gibbon J., Psych. Rev. 2000, 107(2):289344 • Acquisition and Extinction in Autoshaping Sham Kakade and Peter Dayan, Psych. Rev. 2002, 109(3):533-544

Evolution of Reinforcement Learning in Unpredictable Environments

Related documents

Products

Support

Evolution of Reinforcement Learning in Unpredictable Environments

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib