Instrumental Conditioning Also called Operant Conditioning Instrumental Conditioning Procedures Appetitive Positive Contingency (R produces O) Negative Contingency (R prevents O) Positive Reinforcement Response increases Omission Training Response decreases Aversive Punishment Response decreases Negative Reinforcement Response increases Instrumental Conditioning involves three key elements: • a response • an outcome (the reinforcer) • a relation, or contingency, between the R and O The Instrumental Response • usually an arbitrary motor response • for example, bar-pressing has nothing to do with eating food • there are limits on the types of responses that can be modified by instrumental conditioning • relevance, or belongingness, is an issue in instrumental conditioning as well as in Pavlovian conditioning Relevance, or Belongingness, in Instrumental Conditioning Certain responses naturally ‘belong with’ the reinforcer because of the animal’s evolutionary history Just like all CSs are not equally associable with all USs, not all responses are equally conditioned with all reinforcers Shettleworth tried to condition various behaviors with food reward in hamsters • used a number of different behaviors • digging and face-washing Digging Mean Time Spent in Behavior Facewashing Trials • some responses are more relevant to food reward than others • behavior such as digging increase the chances of coming in contact with food • face-washing won’t increase the chances of coming in contact with food; may even interfere with food-related behaviors • The Brelands trained many different species to perform tricks for ads, movies, etc. e.g., pigs putting coins in a piggy bank. Instinctive Drift Often they found that once the response was trained, it would deteriorate; other “instinctive” behaviors (e.g., rooting the coins) would “drift” in and interfere with performance of the operant response. The pigs treated the coins as if they were food and these food related behaviors interfered with the response the Brelands were trying to condition The Instrumental Reinforcer Increases in the quantity or quality of the reinforcer can increase the rate of responding Experiment by Hutt (1954) – described in the text In runway experiments, animals will run faster for bigger reward Responding to a particular reward also depends on an animal’s past experience with other reinforcers Experiment by Mellgren (1972) – described in the text Experiment by Crespi (1942) Experiment by Crespi (1942) 3 groups of rats were given 20 trials to run down an alleyway for food Group 1: large reward – 64 pellets Group 2: medium reward – 16 pellets Group 3: small reward – 4 pellets Gp 1 Mean Speed Gp 2 Gp 3 Baseline Crespi (1942) In phase 2, the reward level was switched for 2 groups Group 1: 64 pellets – 16 pellets; negative contrast Group 2: 16 pellets – 16 pellets Group 3: 4 pellets – 16 pellets; positive contrast Crespi compared groups who were switched to 16 pellets from a large or small reward to a group consistently given 16 pellets 4-16 Gp 1 Mean Speed 16-16 Gp 2 Positive contrast (4-16 pellets) Ran faster Gp 3 64-16 Shift Baseline Test trials Negative contrast (64-16 pellets) Ran slower Positive and negative contrast indicate that behavior is not just affected by current conditions Performance is also affected by previous reward conditions The Response – Reinforcer Relation Two types of relationships exist between a response and a reinforcer: temporal relationship; temporal contiguity refers to the delivery of the reinforcer immediately after the response causal relationship; response-reinforcer contingency refers to the extent to which the response is necessary and sufficient for the occurrence of the reinforcer Effects of temporal contiguity Instrumental learning is disrupted by delaying the reinforcer after the response Dickinson et al (1992) rats were reinforced for lever-pressing varied the delay between occurrence of the response and delivery of the reinforcer 20 15 Lever presses/min 10 5 0 20 40 Delay (s) 60 Why is instrumental conditioning so sensitive to a delay of reinforcement? Delay makes it difficult to figure out which response is being reinforced There are ways to overcome the problem: 1. Provide a secondary, or conditioned, reinforcer immediately after the response, even if the primary reinforcer does not occur until later A secondary or conditioned reinforcer is a conditioned stimulus that was previously associated with the reinforcer Conditioned reinforcers can serve to ‘bridge’ a delay between the response and the primary reinforcer 2. Another technique that facilitates learning with delayed reinforcement is to mark the target response to distinguish it from other responses The marking procedure demonstrated by Lieberman et al (1979) They tested whether rats could learn a correct turn or choice in a maze despite a long delay of reward Black Start Box Delay Box Choice Box Goal Box White Subjects were placed in the start box and allowed to choose one of two alleyways (White was correct) Three groups: Group 1: Light – after they made a choice, rats in this group received a 2 s light (regardless of choice) and were allowed to go to the delay box Group 2: Noise – treated the same, except 2 s noise Group 3: Control – no stimulus; went directly to delay box after the choice All rats confined to the delay box for 2 min, then allowed to go to the goal box. Food was given, but only if they had chosen white. Results: Noise Light 100 Mean Percent Correct 50 Control Trials Control group stayed at approximately 50% correct Light and Noise groups learned the discrimination (i.e., learned to choose white over black) So why did the Light and Noise improve discrimination learning? the cues helped to mark the choice response in memory after making a choice and receiving the L or N, subjects more effectively rehearsed the choice they had just made when reward was given later on, after 2 min delay, the memory for previous choice was stronger these effects of marking cannot be explained in terms of secondary or conditioned reinforcement because the marking stimulus was presented after both correct and incorrect choices