Conditioning Bear with me. Bare with me. Beer with me. Stay focused. Learning • A. Two-process learning (Rescorla-Solomon 67) • fast: fear and arousal • slow: adaptive behavioral responses • B. Three-process learning •A • declarative memory (as opposed to procedural) • C. More-than-three-process learning •A • declarative memory • episodic memory • semantic memory • more stuff Typically this subsides as this is learned. Conditional and Unconditional US innate UR US innate Training S CS US = “Reinforcer” Delay procedure Trace procedure CS CS US US easier harder UR/CR Classical and Operant US innate UR/CR delivery of the reinforcer is contingent on the occurrence of a stimulus (the CS). CS US S1 innate Action delivery of the reinforcer is contingent on the occurrence of a designated response CC predicts that the animal will produce UR/CR while performing the desired action, but does not explain why the animal learns to select the action. Selectionist View • Selectionist principles – Behaviors are varied, selected and retained in a process similar to the natural selection of the species – Only overt behaviors can be reinforced by the environment – Principle of the selection is based in the behavioral discrepancy Behavioral Discrepancy Behavioral discrepancy is the change in an ongoing behavior produced by the eliciting stimulus Example: Presentation of food produces salivation which would not otherwise occur Unified Selection Principle Whenever a behavioral discrepancy occurs, an environmentbehavior relation is selected that consists -- other things being equal -- of all those stimuli occurring immediately before the discrepancy and all those responses occurring immediately before and at the same time as the elicited response. Under this principle there is no difference between Classical and Operant conditioning as far as learning goes. Conditioning Phenomena Name Set I Set II l r Pavlovian Test l r R U l r |S |V |Ts r |W 1 2 l s r Overshadowing Inhibitory 1 2 l r U R S Tl s V W sr s Blocking l r l s r Upwards unblocking l r l s r u r sr ls r s r Downwards unblocking l r u r It goes on... Conditioning/Selection Models • Trial-by-trial • Probabilistic (Dayan-Long, Cheng-Novick) • … and not (Rescorla-Wagner) • NN (Donohoe) • Moment-by-moment • Sutton-Barto • Mignault • Schmajuk (NN) • ~ Bazillion of others... S1 and S2 processing should happen at roughly the same time so almost all models suggest a multiplicative relationship between levels of S1 and S2. V ~ US CS or V ~ ( reinforcement) (eligibility) Rescorla-Wagner model • Trial based • Based on net prediction of the reward • Only happens when prediction discrepancy is detected • Falls out straight from ML estimation of association strength • Is essantially the delta-rule net prediction association strength update reward stimulus eligibility V ( VS ) S Problems: • Does not deal well with overshadowing and downwards unblocking... • Does not depend on the temporal relations between stimuli • Does not explain re-acquisition rate Sutton-Barto model • Real-time model • Combines Y theory with RW model • time-derivative model • presumes that all stimuli produce +V at the onset and -V at the offset • Deals with secondary conditioning V Y S Y Y ( t ) Y ( t t ) sum of all the associative strengths at a given time Problems: • Does not model Inter-Stimulus Intervals where the efficiency of the training should decrease with increased ISI • Does not deal with reacquisition Temporal Difference model • Is related to the SB model (and the RW model) • Models reward in small discrete intervals • Models second order conditioning • Based on the assumption that the goal of learning is to accurately predict the future US levels V ( t 1 Vt 1 Vt ) S discounted prediction of the future reward (V for predicted values of S) Problems: • No model of attention, salience, configuration etc... • No indirect associations modeled (sensory preconditioning) • Problems with downwards unblocking Statistical models P(r| s1, s2 ) ? P( r| s1 , s2 ) N ( w1s1 w2 s2 , 2 ) This results in exactly the RW model with ML. P( r| s1, s2 ) 1N ( w1s1, , 2 ) 2 N ( w2 s2, , 2 ) N ( w , 2 ) This is EM. Similar to comparator models of conditioning (whatever they are). Has problems with inhibitory conditioning. P( r| s1 , s2 ) N ( 1w1s1 2 w2 s2 , 2 ) Dayan & Long’s model. Models the conditioning phenomena. Does not consider associability (eligibility in SB) and attention. No distinction between preparatory and consumatory conditioning NN models Warning: a personal opinion! • Everything is a neural net - things happen naturally • The weights propagate and this forms the dynamics of the Stimulus-Stimulus interactions S1 Stuff happens here Response S2 Whatever…. Bruce’s favorite model • Model time and rate of CS and reinforcement • Time -scale invariant • Non-associative framework rates of reinforcement 1 2 n L 1 M M t M M t M M t M Nt 12 2 1n n t12 t1 1 t2 n tn cumulative duration of the conjunction of S1 and Sn O P P P P P P 1P Q t 1n t1 t 2n t2 1 N1 t1 N2 t2 Nn tn cumulative number of reinforcements in presence of Sn cumulative duration of Sn References • Dayan, P., and Abbot, L. F. (2000?). Theoretical Neuroscience. In Print??? (http://www.gatsby.ucl.ac.uk/~dayan/book/) • Dayan, P. and Long, T., (1998?). Statistical Models of Conditioning. NIPS10. •Gallistel, C. R., and Gibbon, J., (2000) . Time, Rate and Conditioning. Psychological review, in print. • Pavlov, I. P. (1927). Conditioned Reflexes. Oxford: Oxford University Press. • Mignault, A. and Marley, A. A. J. (1997). A Real-Time Neuronal Model of Classical Conditioning. Adaptive Behavior. Vol. 6-1, 3-61. • Rescorla, R. A. (1988). Behavioral studies of Pavlovian conditioning. Annual Review of Neuroscience 11: 329 - 352. • Rescorla, R. A., and R. L. Solomon. (1967). Two-process learning theory: Relationships between Pavlovian conditioning and instrumental learning. Psychological Review 74: 151 - 182. • Rescorla, R. A., and A. R. Wagner. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black and W. F. Proskay, Eds., Classical Conditioning, vol. 2, Current Research and Theory. New York: Appleton-Century-Crofts, pp. 54 - 99. • Roitblat, H. L. and Meyer, J.-A.. Comparative Approaches to Cognitive Science. MIT Press. • Schmajuk, N. A. (1997). Animal Learning and Cognition. A neural Network approach. • Skinner, B. F. (1938). The Behavior of Organisms. New York: Appleton-Century-Crofts. • Sutton, R. S., and Barto, A. W, (1990). Computational Neuroscience: Foundations of Adaptive Networks. MIT Press • Thorndike, E. L. (1911). Animal Intelligence: Experimental Studies. New York: Macmillan. • Wilson, R. A. and Keil, F. (1999) The MIT Encyclopedia of Cognitive Sciences. MIT Press. MITECS (http://cognet.mit.edu/MITECS)