Instrumental Conditioning

advertisement
Instrumental Conditioning
Also called Operant Conditioning
Instrumental Conditioning Procedures
Appetitive
Positive
Contingency
(R produces O)
Negative
Contingency
(R prevents O)
Positive
Reinforcement
Response increases
Omission
Training
Response decreases
Aversive
Punishment
Response decreases
Negative
Reinforcement
Response increases
Instrumental Conditioning involves three key elements:
• a response
• an outcome (the reinforcer)
• a relation, or contingency, between the R and O
The Instrumental Response
• usually an arbitrary motor response
• for example, bar-pressing has nothing to do with
eating food
• there are limits on the types of responses that can be
modified by instrumental conditioning
• relevance, or belongingness, is an issue in instrumental
conditioning as well as in Pavlovian conditioning
Relevance, or Belongingness, in
Instrumental Conditioning
Certain responses naturally ‘belong with’ the reinforcer
because of the animal’s evolutionary history
Just like all CSs are not equally associable with all USs,
not all responses are equally conditioned with all
reinforcers
Shettleworth tried to condition various behaviors with
food reward in hamsters
• used a number of different behaviors
• digging and face-washing
Digging
Mean Time
Spent in
Behavior
Facewashing
Trials
• some responses are more relevant to food reward than others
• behavior such as digging increase the chances of coming in
contact with food
• face-washing won’t increase the chances of coming in contact
with food; may even interfere with food-related behaviors
• The Brelands trained many different species to
perform tricks for ads, movies, etc. e.g., pigs
putting coins in a piggy bank.
Instinctive Drift
Often they found that once the response was
trained, it would deteriorate; other “instinctive”
behaviors (e.g., rooting the coins) would “drift” in
and interfere with performance of the operant
response.
The pigs treated the coins as if they were food and these
food related behaviors interfered with the response the
Brelands were trying to condition
The Instrumental Reinforcer
Increases in the quantity or quality of the reinforcer
can increase the rate of responding
Experiment by Hutt (1954) – described in the text
In runway experiments, animals will run faster for bigger
reward
Responding to a particular reward also depends on an
animal’s past experience with other reinforcers
Experiment by Mellgren (1972) – described in the text
Experiment by Crespi (1942)
Experiment by Crespi (1942)
3 groups of rats were given 20 trials to run down an alleyway
for food
Group 1: large reward – 64 pellets
Group 2: medium reward – 16 pellets
Group 3: small reward – 4 pellets
Gp 1
Mean
Speed
Gp 2
Gp 3
Baseline
Crespi (1942)
In phase 2, the reward level was switched for 2 groups
Group 1: 64 pellets – 16 pellets; negative contrast
Group 2: 16 pellets – 16 pellets
Group 3: 4 pellets – 16 pellets; positive contrast
Crespi compared groups who were switched to 16 pellets from a
large or small reward to a group consistently given 16 pellets
4-16
Gp 1
Mean
Speed
16-16
Gp 2
Positive contrast
(4-16 pellets)
Ran faster
Gp 3
64-16
Shift
Baseline
Test trials
Negative contrast
(64-16 pellets)
Ran slower
Positive and negative contrast indicate that behavior
is not just affected by current conditions
Performance is also affected by previous
reward conditions
The Response – Reinforcer Relation
Two types of relationships exist between a response
and a reinforcer:
 temporal relationship; temporal contiguity refers
to the delivery of the reinforcer immediately after the
response
 causal relationship; response-reinforcer contingency
refers to the extent to which the response is necessary
and sufficient for the occurrence of the reinforcer
Effects of temporal contiguity
Instrumental learning is disrupted by delaying the
reinforcer after the response
Dickinson et al (1992)
rats were reinforced for lever-pressing
varied the delay between occurrence of the response
and delivery of the reinforcer
20
15
Lever
presses/min
10
5
0
20
40
Delay (s)
60
Why is instrumental conditioning so sensitive to a
delay of reinforcement?
Delay makes it difficult to figure out which response
is being reinforced
There are ways to overcome the problem:
1. Provide a secondary, or conditioned, reinforcer
immediately after the response, even if the primary
reinforcer does not occur until later
A secondary or conditioned reinforcer is a
conditioned stimulus that was previously
associated with the reinforcer
Conditioned reinforcers can serve to ‘bridge’ a
delay between the response and the primary
reinforcer
2. Another technique that facilitates learning with
delayed reinforcement is to mark the target response
to distinguish it from other responses
The marking procedure demonstrated by
Lieberman et al (1979)
They tested whether rats could learn a correct turn
or choice in a maze despite a long delay of reward
Black
Start
Box
Delay
Box
Choice
Box
Goal
Box
White
Subjects were placed in the start box and allowed to choose one
of two alleyways (White was correct)
Three groups:
Group 1: Light – after they made a choice, rats in this group
received a 2 s light (regardless of choice) and were allowed to go
to the delay box
Group 2: Noise – treated the same, except 2 s noise
Group 3: Control – no stimulus; went directly to delay box after
the choice
All rats confined to the delay box for 2 min, then allowed to go to
the goal box. Food was given, but only if they had chosen white.
Results:
Noise
Light
100
Mean
Percent
Correct 50
Control
Trials
Control group stayed at approximately 50% correct
Light and Noise groups learned the discrimination (i.e., learned
to choose white over black)
So why did the Light and Noise improve discrimination
learning?
 the cues helped to mark the choice response in
memory
 after making a choice and receiving the L or N,
subjects more effectively rehearsed the choice they
had just made
 when reward was given later on, after 2 min delay,
the memory for previous choice was stronger
 these effects of marking cannot be explained in
terms of secondary or conditioned reinforcement
because the marking stimulus was presented after
both correct and incorrect choices
Download