Instrumental Conditioning

advertisement
Instrumental Conditioning
Also called Operant Conditioning
Pavlovian Conditioning = Stimulus learning
Instrumental Conditioning = Response learning
Instrumental behavior = behavior that occurs
because it was previously instrumental in
producing certain consequences
Also called ‘goal-directed’ behavior
Early work in this area done by Thorndike (late 1800s)
 put cats in puzzle boxes
 the cats had to manipulate a latch to open the door,
escape and get food
 initially the cats would thrash about randomly until
they accidentally opened the door
 then the latency to escape and get food would decrease
over successive trials
Latency
Trials
The behavior, or response, is instrumental in, or
responsible for, the outcome (i.e., escaping and
getting food).
Thus, this type of learning became known as
‘instrumental conditioning’
Thorndike interpreted the results of his experiment
as reflecting the learning of an S-R association
Thorndike believed the cats learned an association
between the stimuli inside the puzzle box and the
escape response
The consequence of the successful response – escaping
the box – strengthened the association between the box
stimuli and that response
On the basis of his work, Thorndike formulated
the law of effect
Law of effect
 this law states that if a response in the presence of a
stimulus is followed by a satisfying event, the association
between the stimulus (S) and the response (R) is
strengthened
 if the response is followed by an annoying event,
the S-R association is weakened
 according to the law of effect, animals learn an
association between the response and the stimuli present
at the time of the response – the consequence of the
response is not one of the elements in the association
 the satisfying or annoying consequence simply serves
to strengthen or weaken the association between the
response and the stimulus situation
Modern approaches to the study
of Instrumental Conditioning
Discrete-Trial Procedures
 involves a single response performed only at a certain time
 rat is put in start arm and runs to
goal arm where it receives food
Goal
 only single action (or response
sequence) and reward is given
 then rat removed from the apparatus
 after ITI the animal is placed in the start
Start
arm again for another trial
 each response is a discrete action. The speed and onset of the
behavior is determined by the subject. The experimenter
determines when the subject may begin the action (usually by
putting the rat in the start arm)
Modern approaches to the study
of Instrumental Conditioning
Free-Operant Procedures
 more like what you will do in the lab with bar-pressing
 here the experimenter decides which behavior is correct but
the subject determines when the behavior will be executed
 your rats will be put in an operant chamber and allowed to
respond at their own pace
 subjects can make the response and receive reward more than
once
 an operant response (such as a bar-press) is defined in terms of
the effect that it has on the environment – the critical thing is not
the muscles involved in the behavior but the way in which the
behavior ‘operates’ on the environment
Cumulative Recorder
Cumulative Record
 way of presenting data in free-operant procedures
 one response builds on the previous response
Response Measures
 with discrete-trial procedures can measure speed, and
latency to make the response
 with free-operant procedures can measure rate of
responding (#BP/min)
Studies can also combine elements of both discrete-trial
and free-operant procedures
For ex., L tells subject when to respond and the subject
can then respond at its own pace
Responses are freely emitted by the subject, but some
control by the experimenter
Magazine Training and Shaping
 in order for instrumental conditioning to occur, the
subject must make the desired response prior to receiving
the reward
 how do we train rats to make the response
(i.e., bar-press)?
 the preliminary phase is called magazine training
• the food-delivery device is called the food magazine
• when food is delivered the device makes a noise
• after enough pairings of this noise with food
delivery, the rat will go to the food cup when it hears
the noise
Magazine Training and Shaping
 after magazine training, the animal is ready to learn the
instrumental response
 we ‘teach’ the response through a process called response
shaping
 shaping involves:
• reinforcement of successive approximations to the
required response
• and nonreinforcement of earlier response forms
Instrumental Conditioning Procedures
There are 4 basic types of instrumental conditioning
These 4 types are categorized according to:
1. Nature of the outcome controlled by the behavior
a. Appetitive stimulus – pleasant outcome (food)
b. Aversive stimulus – unpleasant outcome (shock)
2. Relationship or contingency between the response
and the outcome
a. Positive contingency – R produces O
b. Negative contingency – R eliminates/prevents O
Instrumental Conditioning Procedures
Appetitive
Positive
Contingency
(R produces O)
Negative
Contingency
(R prevents O)
Aversive
Positive
Reinforcement
Punishment
Omission
Training
Negative
Reinforcement
Instrumental Conditioning Procedures
1. Positive reinforcement – also called reward training
• response produces an appetitive outcome that is not
as likely to occur otherwise
• positive contingency between the response and an
appetitive stimulus
• example, rat bar-press for food
• result: response increases
Instrumental Conditioning Procedures
2. Punishment
• positive contingency between the response and an
aversive stimulus
• if the subject performs the response, it receives the
aversive outcome
• if the subject does not perform the response, it does
not receive the aversive outcome
• example, shock a rat whenever it makes a BP
or mother scold child for playing in the street
• result: response decreases
Instrumental Conditioning Procedures
3. Negative reinforcement
• negative contingency between the response and an
aversive stimulus
• the response terminates or prevents the delivery of
an aversive stimulus
• example, put a rat in a box and give him continuous
shocks. The rat could jump over a barrier to turn off
or escape the shock.
• this is called an escape procedure
• subjects can also avoid an aversive stimulus that is
scheduled to occur
Instrumental Conditioning Procedures
3. Negative reinforcement (Escape/Avoidance)
• example, students can study before an exam to avoid
getting a bad grade
• example, a rat may be scheduled to receive a shock
at the end of a warning stimulus. If he makes a certain
response (BP or jump over a barrier) before the
warning stimulus is over, he avoids getting shocked
• result: response increases
Instrumental Conditioning Procedures
4. Omission Training
• negative contingency between the response and an
appetitive stimulus
• the response prevents the delivery of a pleasant
event
• if the subject does not make the response, the reward
is delivered
• example, a child is told to go to his room when he does
something bad
• example, suspending someone’s driver’s license for
impaired driving
• result: response decreases
Instrumental Conditioning Procedures
4. Omission Training
• usually with omission training, the subject can receive
the reward for engaging in other behaviors
• sometimes referred to as DRO
•Differential Reinforcement of Other behaviors
Download